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Preface 


I organized several seminars on cryptography, the students generally reflected that 
cryptography doesn’t need much mathematics, and computer language and computer 
working environment are more important. Later, I reviewed several common cryp- 
tography textbooks at home and abroad. If so, these textbooks are for engineering 
students, and the purpose is to cultivate cryptographic engineers. It is my original 
intention to write a textbook of theoretical cryptography for students of mathematics 
department and science postgraduates, which systematically teaches the statistical 
characteristics of cryptographic system, the computational complexity theory of 
cryptographic algorithm and the mathematical principles behind various encryption 
and decryption algorithms. 

With the rapid development of the new generation of digital technology, China 
has entered the era of information, network and intelligence. Cryptography is not 
only the cornerstone of national security in the information age, but also a sharp 
sword to protect people’s property security, personal privacy and personal dignity. 
After the establishment of the first-class discipline of Cyberspace Security, China has 
established the first-class discipline of security. In particular, on December 19, 2019, 
China officially promulgated the code law to formulate a law for a discipline. This is 
rare all over the world. Lately, the central government explicitly requests to cultivate 
our own cryptography professionals. It can be seen that the discipline construction 
and personnel training of cryptography have been promoted to the height of national 
security, which has become a major national strategic demand. Writing a textbook on 
cryptography theory aims to cultivate our own cryptographers, which is the ultimate 
reason for writing this book. 

Cryptosystem is an ancient art. Since the birth of human beings, there has been 
cryptosystem. For example, the means of communication used by human beings in 
war, the marks and conventions used by special groups can be classified into the 
category of cryptosystem art. Among them, the famous Caesar cryptosystem can be 
regarded as the representative work of ancient cryptosystem. For thousands of years, 
cryptosystem, as a technology, relies on personal intelligence and ingenuity. Occa- 
sionally, some mathematical ideas and methods were used fragmentarily. This era of 
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cryptographers changed fundamentally only after the great American mathematician 
M. Shannon came out. 

In 1948 and 1949, Shannon successively published two epoch-making papers in 
the technical bulletin of Bell laboratory. In the first paper, Shannon established the 
mathematical theory of communication and established the random measurement of 
information by using the method of probability theory, thus laid the foundation of 
modern information theory. In the second paper, Shannon established the informatics 
principle of cryptography, introduced the probability and statistics principle system 
of mathematics into cryptography structure and cryptanalysis, and transformed the 
ancient cryptography technology from art to science. Therefore, people not only 
call Shannon the father of modern information theory, but also the father of modern 
cryptography. 

After Shannon’s great changes from the era of cryptographer to the era of cryp- 
toscience, the ancient cryptology technology ushered in the second historic leap in 
1976, that is, the era of symmetric cryptography changed into the era of public key 
cryptography. In 1976, two Stanford University scholars W. Diffie and M. Hellman 
published a pioneering paper on asymmetric cryptography in JEEE Transactions 
on Information Theory and then entered the era of public key cryptography. Public 
key cryptography and mathematics are more deeply crossed and integrated, making 
cryptography an inseparable branch of mathematics. The era characteristic of public 
key cryptography is to change the cryptography from a few users to mass consumer 
products, which greatly improves the efficiency and social value of the cryptography. 
Nowadays, asymmetric cryptosystem is widely used in message authentication, iden- 
tity authentication, digital signature, digital currency and blockchain architecture, 
which cannot be replaced by classical cryptosystem. 

Based on Shannon’s information theory, this book systematically introduces the 
information theory, statistical characteristics and computational complexity theory 
of public key cryptography, focusing on the three main algorithms of public key cryp- 
tography, RSA, discrete logarithm and elliptic curve cryptosystem, strives to know 
what it is and why it is, and lays a solid theoretical foundation for new cryptosystem 
design, cryptoanalysis and attack. 

Lattice theory-based cryptography is a representative technology of postquantum 
cryptography, which is recognized by the academic community as being able to 
resist quantum computing attacks. At present, the theory and technology of lattice 
cryptography have not entered university textbooks, and various achievements and 
introductions have been scattered in research papers at home and abroad in the past 
two decades. The greatest feature of this book is that it systematically simplifies 
and combs the theory and technology of lattice cryptography, making it a classroom 
textbook for senior college students and postgraduates of cryptography, which will 
play an important role in accelerating the training of modern cryptography talents in 
China. 

This book requires the reader to have a good foundation in algebra, number theory 
and probability statistics. It is suitable for senior students majoring in mathematics, 
compulsory for cryptography and science and engineering postgraduates. It can also 
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be used as the main reference book for scientific researchers engaged in cryptography 
research and cryptographic engineering. 

The main contents of this book have been taught in the seminar. My doctoral 
students Hong Ziwei, Chen Man, Xu Jie, Zhang Mingpei, Associate Professor Huang 
Wenlin and Dr. Tian Kun have all put forward many useful suggestions and help for 
the contents of this book. In particular, Chen Man has devoted a lot of time and energy 
to text printing and proofreading. Here, I would like to express my deep gratitude to 
them! 


Beijing, China Zhiyong Zheng 
November 2021 


Contents 


1 Preparatory Knowledge ............... 0.00. c eee eee eee eens 1 
Li JBeSDURS uoilxaerlsss t ebeLÓSP IQ UMS etot ierot dasaweasaweas 1 

L2 Computational Complexity <...1c6c cose cdnic ade Rer Rc dee en 3 

L3. Jensen Inequalily sa ice ceo Eoo aeo xx pd Ph sS ES tap 9 

LA. “Sens EDITIO uasa E E A EEA E T 13 

LS told BernoulD Experiment 23.033 denso ooi rre irre ees 15 

L5 dhebvshey Inegualiy 4121223248 92k tit Ena En reget GERENS 17 

Jg Ihesus PROGRES: cond oo had CERE peni ih kret htr dde obw 26 
PBISOHISES- Lusia quidera eo eap 99 Qo Dec as Fa e bx dee T 32 

2 The Basis of Code Theory .......................Luuuuuueeueuuss 38 
2. Jede DRS 2214242299 pEERRAPEHEREA PRG EPA RASEN 36 
22 Bas Os ae eee eee eee eae a Eire Mbit bbb bb Ie e eee 44 

cx WAV dbi ccc aM rm 51 
2:4 Some Typical ades 2iibossthbseoidb4énebtinebtitebtetebei potes ps 55 
241 Hadamard Codes cccccecuicegen desea deeaed cence PEE RE 33 

24.2 Binary Golay Codes oc ccccviepiv ine Lek RE RE RR ieas 357 

283 3JAryGolay Code oo iaxonsaueebak eere Exo eave shave oe 61 

244 Reed-Muller Codes. ..ccc.i0ntiearviraresctartsdaresun 64 

2,29 Shannon Theorem ics eei pRIOIPDLP korin nseri ERR P SE 74 
ESISEPIDEN. Pree C er eee ere CC ere rete eT ere eer Tree ere eee ee RS dd 87 

J ONDE EBOOEN aouoosipxd pei dp bead eo ad eo a d Sq P PRU ans 9] 
LI Amormiatonb5puoe onere ee RR REA EDR UR TEPER 91 
3.2 Joint Entropy, Conditional Entropy, Mutual Information .......... 96 
3.9 BedumdaDOV iore eo 4RELEXRRCROREDERREC ERREUR ER Genes 103 
AU Mar Chan Q4oELESEERDEFe SEDES E SUA a UE MER ORE I OPE id 110 
33 Some Coding THeDIGHE dia cs riae dae da das EISERES E ERA 114 
a0 Opümal Code TDBOFV 2 32.58 uoskbn tbi he eb hk RE RR REOR 121 
3.7 Several Examples of Compression Coding ..................... 130 
Sol More oes os PE 130 

332 Hulman CodeS 22222849029 RP 3X RE d cbasiacssen pu 132 


N 


Contents 


3.7.3 Shannon-Fano Codes ................ssseeeeeeesssssss 133 
A8 Channel Coding Theorem. ssscesaoseseseerttexar wer eweesesn 195 
Jic: E EE I PI ENTE E ET oe 


Cryptosystem and Authentication System ......... ———— To 
4.1 Definition and Statistical Characteristics of Cryptosystem "— es 
42 Pul Confidential System 2oiseosscsere haie Ré aed ieeesasn 158 
4.3 Idea apa iit APESE ETE EAEE EEE ITEE E WO 
tl uus 163 

165 
168 
171 
171 
A i "adn ae a a a C 
4.7.3 Discrete Logarithm 1.2 prireprtieratid dd 
ATA Knapsack Problemi sreserriserensessinassigootosorscse IBF 
References sice TER 


Prime Test . LIPE EE kl xc TA MI I uui NT 
FA Bonnet TE «4 arxxanassoeesdanraerevaevs "me DL 
52 Poler Test giusssesdeteséed eder d doreg ido 202 
5.3 Monte Carlo Method . s TE ; TRECE 
5.4 Fermat Decomposition and Factor Bass Method s2sictorcavesone 217 
5.5 Continued Fraction Method. s.sssrssrssiirarissrrissseisssriie 222 
References ..............000 "een 


Elliptic Curve ........... TEES 
6.1 Basic Theory .......... Leaded idit d dMddei de obe. aU 
6.2 Elliptic Curve Public Ke Crypiosystem igevindnrcagaeradegtte 200 
6.3 aeaa pararet ineine oeeie a 
Dci. dq ————— Re 


Lattice-Based Cryptography ..................sssssssssssssssss. 253 
T4 ‘Geometry ot Numbers: Leocccsa hagas yet da PE E 
7.2 Basie Properties of Lattice sepcssiisesriresriinssiroosiiariirs 264 
ERO ara d idend TUTTI E ED 


Acronyms 


1. [x] denotes the largest integer not greater than the real number x, [x] denotes 
the smallest integer not less than the real number x, so there are 


[x] <x « [x] +1, [x] - 1 <x < [x]. 


2. C denotes a complex field, R denotes a real number field, Q denotes a rational 
number field, F, denotes a finite field of g elements, g = p', p is prime, Z 
denotes a integer ring, Z,, denotes a residue class ring of modm (m > 1). 

3. (a,b) denotes the greatest common divisor of two integers and sometimes a 
two-dimensional vector. 

4. amodn denotes the minimum nonnegative residue of the integer a modulo n, 
ie. 0 < amodn < n. Sometimes it means minimum absolute residue, i.e., 
lamodn| < in. 

5. Let be a field, F[x] denotes a polynomial ring of one variable over the field 
IF. Sometimes the variable T is used, i.e., F[T], where F = C, R, Q, or F = FF, 
is a finite field. 

6. The base of logarithm logN can be any real number b > 1. If b = 2, it is binary 
logarithm, and when b = q, itis q-base logarithm. Sometimes logN also means 
natural logarithm, which is determined according to the specific situation. 

7. | P(A) denotes the probability of occurrence of random event A. 

8. IfGisagroup,a € G is the element of the group. Then o(a) denotes the order 
of a. 


xi 


Chapter 1 A) 
Preparatory Knowledge geai 


Modern cryptography and information theory is a branch of mathematics which 
develops rapidly. Almost all mathematical knowledge, such as algebra, geometry, 
analysis, probability and statistics, has very important applications in information 
theory. Especially, some modern mathematical theories, such as algebraic geometry, 
elliptic curve and ergodic theory, play more and more important roles in coding and 
cryptography. It can be said that information theory is the most dynamic branch of 
modern mathematics with wide application, strong intersection. This chapter requires 
the reader to have a preliminary knowledge of analysis, algebra, number theory and 
probability statistics. 


1.34 Injective 


Let o be a mapping of two nonempty sets A to B, denoted as A = B. Generally, 
the mappings between sets can be divided into three categories: injective, surjective 
and bijective. 


Definition 1.1 Let c be a mapping of two nonempty sets A — B, we define 
©) a,b € A, ifa Æ b > o(a) Z o (b), call o an injective of A > B, itis called 
injective for short. 
i) If any b € B, there is aa € A > o(a) = b, call o a surjective of A > B. 
(iii) If A — B is an injective and a surjective, call o a bijective of A > B. 
(iv) Let 14 be the identity mapping of A — A, which is defined as 


l4(a) =a,VaeA. 


(v) Suppose A -= B — C are two mappings, define the product mapping of 
t andc, to : A — C, and define as 
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ta(a)=T(o(a)), Va € A. 


Obviously, the product of two mappings has no commutativity but has the fol- 
lowing associative law. 


Property 1 Let A *, B — C Š Dbe three mappings, then we have 


(8.1).0 —8- (t-0). (1.1) 


Proof It can be verified directly by definition. 


If A — Bisa given mappings, obviously, there is 
ol, =o, lgo =o. (1.2) 


The above formula shows that identity mapping plays the role of multiplication 
identity in the product of mapping. 


Definition 1.2 (i) Suppose A —+ B — A are two mappings, if to = 14, call c 
is a left inverse mapping of c, ø is a right inverse mapping of t. 

(ii) Let A LM; EU A, If to = 14, oT = 1g, call t is an inverse mapping of c. 
Denote as t = o^ !. 


The essential properties of injective, surjective and bijective between sets are 
described by the following lemma. 


Lemma 1.1 (i)/fA —>+ B has an inverse mapping B —> A, that iso x = 1g and 
to = l4, then x is unique ( denote as t = o^). 

(ii) A — B isan injective if and only if o has a left inverse mapping B — A, 
that isto = 14. 

(iii) A —» Bisan surjective ifand only ifo has aright inverse mapping B — A, 
that isot = 1p. 

(iv) A = B isan bijective if and only if o has an inverse mapping t, and x is 
unique. 


Proof First of all, prove (i). Let B — A and B —> A be two inverse mappings of 
o, then we have 


T10 = la, 20 = la, and oT = 13,0%T = lp, 
From (1.2), we have 

7; = tlg = (0n) = (110)7 = late = t, 
so if o has an inverse mapping, then the inverse mapping is unique. 


To prove (ii), we note that if o has a left inverse mapping T, that is to = 14, 
then o must be an injective, because if a, b € A, a Æ b, then we have o (a) Æ o (b). 
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If c (a) = o (b) > t(o(a)) = t(o(b)) > a = b, contradiction with a Æ b. Con- 
versely, if A — Bisan injective, then for the element o (a) € o (A) inc (A) C B, 
Let r (o (a)) = a, For the elements in the difference set B\o (A), arrange an image 
randomly, then B —» A satisfies to = 1 A. Similarly, we can prove (iii) and (iv), 
we thus complete the proof. 


In many books of information theory, we often confuse injective and bijective, 
but they are two different concepts in mathematics, which needs attention. 


1.23 Computational Complexity 


In binary computing environment, the complexity of an algorithm is measured by 
the number of bit operations. Bit, short for “Binary digit,’ is the basic amount of 
information, one bit represents one digit of binary system, two bits represent two 
digits of binary system, so what is “bit operation"? 

To understand “bit operation” accurately, we start with the b-ary expression of 
real number. Let b > 1 be a positive integer, and any nonnegative real number xcan 
be uniquely expanded into the following geometric series. 


x= 2 d;b' 
—oo«i xk—l 
+00 (1.3) 
= dy ib! + dy ob? +--+ dib + do > dub, 


i=l 
where V d; satisfies 0 < d; < b. So we can express x as 


x = (dy1dy-2 dod 1d: )p, (1.4) 


where (dy 1dy 5»: - - do)» is called a b-ary integer, (0.d_;d_2---), is called a b-ary 
decimal, and 
X = (dk_1dy_2 +++ do)p + (O.d_1d_2 +++ )p. (1.5) 


If b = 2, then x = (dk_1dk-2 --- dod 1: -- )2 is called the binary representation of x. 
If b = 10, then 

x = (di-idy-2 ++ didod_jd_2---)10 (1.6) 

= dy ídi 2: dido.d 1d 5*7. l 

It is our customary decimal expression. It is worth noting that in any system, integers 
and integers are one-to-one correspondence, and decimals and decimals are one-to- 
one correspondence. For example, integer in decimal system corresponds to integer 
in binary system, so does decimal. In other words, the real number of (0, 1) interval 
on the real number axis corresponds to the decimal number of (0, 1) under the binary 
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system one by one. It should be noted that we often ignore binary decimal; in fact, 
it is the main technical support of various arithmetic codes, such as Shannon code. 
Now let us see the b-ary expression of a positive integer n in the decimal system. 
Let 
n = (dy-idi-2:::dido)p, O < dj < b, dk-1 £0, 


k is the number of b-ary digits of n. 


Lemma 1.2 The number k of b-ary digits of positive integer n can be calculated 
according to the following formula: 


k = [log, n] + 1, (1.7) 


[x] denotes the largest integer not greater than the real number x. 


Proof Because of d. 4 0, there is b <n < bh, that is 
k—1<log,n <k, 


There's k — 1 < [log, n] on the left, [1og; n] + 1 < k on the right, and together 
there's 
k = [logn] + 1. 


We complete the proof of Lemma 1.2. 


Now let us see the addition operation in b-ary system. For simplicity, we con- 
sider the addition of two positive integers in binary system. Let n = (1111000), 
m = (11110)2, then n + m = 1111000 + 0011110 = 10010110, that is n + m = 
(10010110)2. The addition of numbers on the same bit actually includes the fol- 
lowing five contents (or operations). 


1. Observe the numbers in the same bit and note if there are progressions in the 
right bit(Every two goes into one). 

2. If the upper and lower digits of the same bit are 0, and there is no progression 
on the right side, the sum of the two digits is 0. 

3. Ifboth the upper and lower digits of the same digit are 0, but there is a progression, 
or if one of the two digits is 0 and the other is 1, and there is no progression, the 
two digits in this digit add up to 1. 

4. If two digits of the same digit have one 0, the other one is 1, and there is one 
progression, or two digits are 1, and there is no progression, the result of addition 
is 0, and one progression is put forward. 

5. If two digits are 1 and have one progression, the sum result is 1 and one progres- 
sion forward. 
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Definition 1.3 A bit operation is an addition operation on the same bit in binary 
addition. Suppose A is an algorithm in binary system, we use Time(A) to represent 
the number of bit operations in algorithm A, that is, Time(A) — completes the total 
number of bit operations performed by algorithm A. 


It is easy to deduce the number of bit operations of binary about addition and sub- 
traction by definition. Let n, m be two positive integers, and their binary expression 
bits are k and / respectively, then 


Time(n + m) = max{k, l}. (1.8) 


In the same way, the number of bit operations required for the multiplication of B 
and D in binary system is satisfied 


Time(nm) < (k + D) - min{k, l} < 2kl. (1.9) 


Itis very convenient to estimate the number of bit operations by using the symbol “O” 
commonly used in number theory. If f(x) and g(x) are two real valued functions, 
g(x) > 0, suppose there are two absolute constants B and C such that when |x| > B, 
we have 
|fGO| € Cg(x), notes f(x) = O(g(x)). 
This sign indicates that when x — on, the order of growth of f(x) is the same as 
that of g(x). For example, let f(x) = agx4 + aq 1x47! +--+» + aix + ag(aq > 0), 
then 
f(x) = O(\x|9, or f(n) = O(n’), n » 1. 


For any € > 0, there is 
logn = O(n*), n> 1. 


From the lemmas of 1.2, (1.8) and (1.9), we have 


Lemma 1.3 Let n, m be two positive integers, k and l are the bits of their binary 
expression, respectively, if m < n, then l < k, and 


Time(n +m) = O(k) = O(logn); 
Time(nm) = O(kl) = O(lognlog m); 
and Time(—) = O(Kkl) = O(lognlog m). 
In the above lemma, division is similar to multiplication. Next, we discuss the 


number of bit operations required to convert a binary representation into a decimal 
representation, and the number of bit operations required for n! to operate in binary. 
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Lemma 1.4 Let k be the number of digits in binary of n, then 
Time(n convert to decimal expression) — O(k?) = O(log” n) 


and 
Time(n!) = O(n7k*) = O(n? log? n). 


Proof To convert n = (dy_dy_2 +--+ dido); to decimal expression. Then divide n by 
10 = (1010), and the remainder is 0, 1, 10, 11, 100, 101, 110, 111, 1000 or 1001, one 
of these binary numbers, these ten numbers correspond to one of the numbers from 0 
to 9 and are denoted as aọ(0 < ao < 9), put ap as the decimal number of n. Similarly, 
divide the quotient by 10 = (1010), and the remainder is converted into a number 
from 0 to 9 as the ten digits of n in decimal system. If we go on like this, we use 
division [log n] + 1 times, the bit operation required for each division is O (4k), so 


Time(n convert to decimal expression) < k - O(4k) = O(k?). 
In the same way, we can prove the bit operation estimation of n!. We complete the 


proof of Lemma 1.4. 


Let us deduce the computational complexity of some common number theory 
algorithms. Let m and n be two positive integers, then there is a nonnegative integer r 
such thatm = r(mod n), whereO < r < n, wecallr the smallest nonnegative residue 
of m under mod n, and denote as r = mmodn. If 1 < m < n, Euclid's division 
method is usually used to find the greatest common divisor (n, m) of n and m. If 
(m, n) = 1, then there is a positive integer a such that ma = 1(mod n), a is called the 
multiplicative inverse of m under mod n, denote as m^! mod n. By Bezout formula, 
if (n, m) = 1, then there are integers x and y such that xm + yn = 1, we usually 
use the extended Euclid algorithm to find x and y. If we find x, we actually calculate 
m~! mod n. Under the above definitions and notations, we have 


Lemma 1.5 (i) Suppose m and n are two positive integers, then 
Time(calculate m mod n) = O (logn - log m). 
(ii) Suppose m and n are two positive integers, and m < n, then 
Time(calculate (n, m)) — O (log? n). 
(iii) Suppose m and n are two positive integers, and (m, n) — 1, then 
Time(calculate m^! mod n) = O (log? max .(n, m)). 
(iv) Suppose n, m, b are positive integers, b « n, then 


Time(b" mod n) = O(logm - log? n). 
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Proof To find the minimum nonnegative residue r of m under mod n is actually a 
division with remainder! 
m=kn+r, 0<r<n. 


From the lemma 1.3, 
Time(calculate m mod n) = O (logn - log m), 


(i) holds. The Euclid algorithm used to calculate the greatest common divisor (n, m) 
of n and m, in fact, it is a division of O (log n) times with remainder, so 


Time(calculate (n, m)) = O(log’ n). 


In Euclid algorithm, we can get x and y by pushing from bottom to top, such that 
xm + yn = 1, this incremental process is called the expansion of Euclid algorithm, 
therefore, if m < n, then 


Time(calculate m ! mod n) — Time (calculate(n, m)) — O (log n). 


(iv) the computational complexity of the power of an integer under mod n. the proof 
method is the famous "repeated square method" . Let 


m = (my my: mymo)2 


= mo + 2m + 4m; x2 my 


be the binary representation of m, where m; = 0 or 1. First, let a = 1. if mp = 1, 
replace a with b, if mo = 0, then a = | remains unchanged, and let bj = b? mod n, 
this is the first square. If mı = 1, replace a with abı mod n, if m, = 0, a remains 
unchanged, and let b; = b? mod n, this is the second square. So if we go on to the 
square of j, we have 

b; = b” (modn). 


Our calculation ends after the square of (k — 1); at this time, there is 


...42k-1 
a= prot +4mo+ T2 mya = b" (mod n). 


Obviously, the number of bit operations per square is O((logn?)?) = O(log” n). 
There is a total of k square operations, k — O (log m). So (iv) holds. We have com- 
pleted the proof. 


Definition 1.4 If an algorithm f involves positive integers nı, n2, ..., n,, whose 
binary digits are kj, k2, .. ., k,, and there are absolute nonnegative integers di, d», 
..., d, such that 


Time(f) = O(kPk? ... k^), (1.10) 


8 1 Preparatory Knowledge 


The complexity of algorithm f is called polynomial; otherwise, it is called nonpoly- 
nomial. 


From Lemma 1.4 , we can see that addition, subtraction, multiplication and division 
between positive integers are polynomial algorithms, but n! operation is the simplest 
example of nonpolynomial algorithm. If we do not need an exact value of n! and 
only need an approximate value, we can get an approximate value of n! by using a 
polynomial term algorithm based on Stirling formula (see Sect. 1.4 of this chapter). 
In the formula (1.10), if dj = d2 = --- = d, = 0, the complexity of algorithm f is 
constant, if di = do =--- = d, = 1, the complexity of the algorithm f is said to 
be linear (the same is true for quadratic, cubic, etc.). In order to characterize non- 
polynomial algorithms, we introduce two concepts: exponential and subexponential 
algorithms. 


Definition 1.5 Suppose that an algorithm f involves a positive integer n, and its 
binary digits are k, if 
Time( f) = O(t8), (1.11) 


where f is a constant greater than 1, and g(k) is a polynomial function of k and 
deg g > 1, then the computational complexity of f is exponential. If g(k) is not a 
polynomial function, but a function smaller than a polynomial, such as ev le£*, then 
the computational complexity of f is subexponential. 


From the above definition, we can see the computational complexity of n!, let k 
be the binary number of n, from 1.2, then n = O (2^), and then from 1.4, 


Time(n!) = O(n7k?) = O(k?2**) = 0(2*), 


So the computational complexity of n! in binary system is exponential. This is the 
simplest example of exponential algorithm. 

Bit algorithm cannot only define the computational complexity but also describe 
the running speed and time complexity of computer. The so-called computer speed 
refers to the total number of bits that the computer can complete in unit time (such as 
a second, or | microsecond). Therefore, there is no difference between the compu- 
tational complexity and the time complexity of an algorithm. We can use the figure 
below to illustrate, suppose that a computer can complete 10° bit operations in one 
second. When the binary bit of the algorithm is k = 10°, the following figure lists 
the running time of different computational complexity algorithms on this computer 
(Table 1.1). 

Note that 1 year ~ 3 x 10 seconds, the age of the universe is about 10!° years; 
when the number of binary digits k is large, the algorithm with exponential or subex- 
ponential computational complexity is actually impossible to complete on the com- 
puter; therefore, the only way to solve the problem is to improve the speed of the 
computer. 

Computational complexity is often used to describe the complexity of a prob- 
lem, because the computational complexity is also time complexity when the com- 
puter hardware conditions (such as computing speed and storage capacity) remain 
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Table 1.1 Time requirements of algorithms with different computational complexity (k = 10°) 


Algorithm type Complexity Number of bit Time 

operations 
Constant degree O(1) 1 1 microsecond 
Linear O(k) 10° Is 
Quadratic O(k?) 10!2 11.6 days 
Cubic O (k?) 1018 32000 years 
Subexponential O (ev los’) About 1.8 x 101618 6 x 10169 years 
Exponential O0 (2*) 10301030 3 x 10301016, ears 


unchanged. At present, the complexity of algorithms is defined in a model called 
Turing machine. Turing machine is a kind of finite state machine with infinite read 
and write ability. If the result of each operation and the content of the next oper- 
ation are uniquely determined, such Turing machine is called deterministic Turing 
machine. Therefore, the determinacy of a polynomial algorithm is accomplished on 
a determinate Turing machine. 


Definition 1.6 If a problem can be solved by polynomial algorithm on a certain 
Turing machine, itis called a P class problem, and the P class problem is often called 
an easy to handle problem. If a problem can be solved by polynomial algorithm on 
an uncertain Turing machine, it is called a N P class problem. 


According to the definition, the P class problem is definitely a N P class prob- 
lem, because it can be solved by polynomial algorithm on deterministic Turing 
machine, and it can also be solved by polynomial algorithm on nondeterministic 
Turing machine. On the other hand, is the N P problem strictly larger than the P 
problem? This is an open problem that has not been solved in the field of theoretical 
computer. There is neither strict proof nor counterexample to show that a problem 
that can be solved by polynomial on a nondeterministic Turing machine cannot be 
solved by polynomial algorithm on a deterministic Turing machine. It is widely spec- 
ulated that the problem of P class and N P class is not equivalent, which is also the 
cornerstone of many cryptosystems. 


1.3 Jensen Inequality 


A real valued function f (x) in the interval (a, b) is called a strictly convex function, 
if for V xj, x2 € (a, D), 4; > 0,45 > 0,44 + Ao = 1, we have 


Ai f Gi) + Aa f (2) < f(xy + A2x2), 


and the equation holds if and only if x, = x2. By inductive method, we can prove 
the Jensen inequality as follows. 


10 1 Preparatory Knowledge 


Lemma 1.6 /f f(x) is a strictly convex function over (a, b), then for any positive 
integer n > 1, any positive number Aj(1 € i < n), Ay +Ag+--- +A, = 1 and any 
x; € (a, D)(1 <i € n), we have 


Daf) s fO hix), (1.12) 
i-l i=l 


the equation holds if and only if xy = x» = -+ -© = Xp. 


Proof By inductive method, the proposition holds when n = 1 and n = 2. Suppose 
the proposition holds for n — 1. When n > 2, let 
A A 
x = : xı + zd 
Aic A2 Aic A2 


25 


it can be seen that x’ € (a, b) and (Ay + A2)x’ = A4x1 + 45x», therefore, 


ufo) = Arf G1) + Aa fr) Y uf Go) 


i=1 i=3 
< Qa + Aa) f@') + Y fou) 
i=3 
< fax + À2X2 Tec AnXn). 


We have the proposition that holds for n. Thus, the inequality (1.12) holds. 


From the knowledge of mathematical analysis, f(x) is called a strictly convex 
function in the interval (a, b) if and only if f í (x) < 0. Take f(x) = log x, then 
f s (x)= Tu thus log x is a strictly convex function on the interval of (0, +00), 
from Jensen inequality, we have the following inequality. 


Lemma 1.7 Let g(x) be positive function, that is g(x) > 0, then for any integers 
Ad <i <n), ài t+Ag+--- +A, = 1, and any aj, a2,...,d,, we have 


) Luogg(a) < log Y ^ Aig(ai), (1.13) 
i=l i=] 
the equation holds if and only if g(a1) = g(a2) = --- = g (an). 


Proof Because log x is strictly convex, let x; = g(aj), then x; € (0, +oo)(1 < i < 
n), by Jensen inequality, 
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Y log g(a;) = Ya; log x; 
i m 
< a Aixi) 
i=l 


= log() ` Aig (ai). 


i=l 
So the lemma holds. 


A real valued function f(x) is called a strictly convex function in the interval 
(a, b), if for V xi, x2 € (a, D), 44 > 0, à2 > 0, Ay + à2 = 1, we have 


f (Ayx1 + à2x2) < Af Qa) + Ao f (2). 


and the equation holds if and only if x; = x2. By induction, we can prove the fol- 
lowing general inequality. 


Lemma 1.8 /f f (x) is called a strictly convex function in the interval (a, b), then 
for any positive integer n > 2, any positive numbers Aj(1 € i € n), ài - Aa +- + 
An = land any x; € (a,b) € i < n), then we have 


fO ux) < Y ufo, (1.14) 
i=l i-l 


the equation holds if and only if xy = x2 = +++ = Xn. 


We know that f (x) is strictly convex in the interval (a, b) if and only if f "(x) > 0. 
Let f(x) = x log x, then T. (x) — > 0, when x € (0, +00). Then we have the 
following logarithmic inequality. 


zb. 
xIn2 


Lemma 1.9 /faj,a5,..., a, and bj, b», ..., b, are two groups of positive numbers, 
then there are 
p» 1 di 
i log — > i) lo € 1.15 
» Br 21 ai) $335 (1.15) 


Proof Because f (x) = x log x is a strictly convex function, from 1.8, we have 


JO Aura v POR 
i=l i=1 


where 3 7 , Aj = 1. Take A; =y x i = $, then 
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1 n aa ai n di di 
n di log = = n log , 
234 bj 2 Paca bi » 23 bj bi 


X4 b; is deleted at the same time on both sides, then there is 


- Yid ON di 
(5 a) log ==— < a; log —, 
3 bcm bi 3 bi 


thus (1.15) holds. 


The above formula is called logarithm sum inequality, which is often used in 
information theory. 


14 Stirling Formula 


In number theory (see reference 1's Apostol 1976), we can get the average asymptotic 
formula of some arithmetic functions by using the Euler sum formula, the most 
important of which is the following Stirling formula. For all real numbers x > 1, 
we have 

» logm = xlogx — x + O(log x), (1.16) 


]xmzx 


where the O constant is an absolute constant. Take x = n > | as a positive integer, 
then there is Stirling formula 


logn! = nlogn — n + O(logn). (1.17) 
In number theory, the Stirling formula appears in the more precise form below, 
nl & A 2nn y 
e 


or 
! 


n! 
lim —— = 
n— 0o /2nn(7)" 
Lemma 1.10 Let0 < m < n,n, m be nonnegative integer, and (2) be the combina- 


tion number, then 
n n^ 
< ———————. (1.18) 
m m" (n — m7" 
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Proof 
n" = (m + (n — m))” > ML =m, 


The (1.18) follows at once. 


We define the binary entropy function H (x)(0 < x < 1) as follows. 


0, if x = 0. 
H(x) — i (1.19) 
— xlogx — (1 — x) log (1 — x), if0<x <1. 


It is obvious that H(x) = H(1 — x). So we only need to consider the case of 
O<x< 1. H (x) is the information entropy of binary information space (see the 
example 3.5 in Sect. 1.1 of Chap.3), the image description is as follows (Fig. 1.1): 


Lemma 1.11 Let 0 <à < 5, then we have 
(i) D o<i<an (?) = IIR. 
(ii) log 345, 0) < nHO). 
(iii) imn% + log Wo <ican (7) = HA). 


Proof We first prove that (i), (ii) can be obtained directly from the logarithm of (i). 


Fig. 1.1 The information pa 
entropy of binary 
information space 


pie 
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I20-0-3»*2 D (ja -a 


O<i<An 

5» (a-v y 

f i 1—A 

OxixAn 

253 (a -» «3 y^ 

=Z £ i 1-A 
O<i<An 

_ 4—-nH(a) n 

=m y (n) 


OxixAn 


In order to prove that (iii), we write m = [An] = An + O(1), from (ii), we have 


1 n 
= log bo (^) < H(A). 


O<i<An 


1 n 1 n 
-l -] 
n og o (^) n °g (^) 


1 
= —{logn! — logm! — log(n — m)!). 
n 


on the other hand, 


IV 


From the Stirling formula (1.17), we have 


logn! — logm! — log(n — m)! = nlogn — mlogm — (n — m)log(n — m) + O (logn). 
So there are 


1 n logn 
— log > _ | > logn — à log àn — (1 — A) logn (1 — A) + O(——) 
mt O<i<An i n 
logn 
= —Alog4 — (1 — A) log(1 — A) + O0 (——) 
n 
logn 
= H(A)+ oCo 


In the end, we have 


1 n 
lim — 1 = H (À). 
Jim tog D (1) = #0) 


O<i<dAn 


Lemma 1.11 holds. 
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1.5 n-fold Bernoulli Experiment 


In a given probability space, suppose that x is a random event and y is a random 
event. We denote the probability of event x occurrence by p(x), the probability of 
joint occurrence of x and y is denoted by p(xy) and the probability of occurrence of 
x under the condition of event y is denoted by p(x|y), which is called conditional 
probability. Obviously, there is a multiplication formula as follows: 


p(xy) = p(y) ply). (1.20) 


Two events x and y, if p(xy) = 0, say x and y are incompatible, if p(xy) = 
p(x) pCy), say two events are independent, or independent of each other. 
A finite set of events (x1, x2, ..., Xn} is called complete event group, if 


n 


3 p(x) = 1, and p(x;y;) = 0, when i £ j. (1.21) 


i=1 


In a complete event group, we can assume that 0 < p(x;) x 1d <i <n). 
Total probability formula: If (x1, x2, ..., Xn} is a complete event group, y is any 
random event, then we have 


p) = 3 pox) (1.22) 
i=l 
and 
pO) =} p(x) (yas). (1.23) 
i=l 
Lemma 1.12 Let (x1, x2, ..., Xn} is a complete event group, then the event y can 


and can only occur simultaneously with a certain xi, so for any i, 1 <i € n, we 
have the following Bayes formula: 


P(x) POX) . 
i = cm ,l<ix<n. 1.24 
Nec apoyo d 


Proof From the product formula (1.20), we have 
POY) = pO)pGily) = PA) pGlxi). 
then there is 


pos) pora) 


pGily) = FS 


And from the total probability formula (1.23), then we can know 
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P(x) Px) 
j-1 p(x) pix)’ 


<n, 


pGily) = x 


the Bayes formula (1.24) is proved. 


Now we discuss the n-fold Bernoulli experiment. In statistical test, the test with 
only two possible results is called Bernoulli experiment, and the experiment satisfy- 
ing the following agreement is called n-fold Bernoulli experiment: 

(1) There are at most two possible results in each experiment: a or a. 

(2) The probability p of occurrence of a in each test remains unchanged. 

(3) Each experiment is statistically independent. 

(4) A total of n experiments were carried out. 


Lemma 1.13 (Bernoulli theorem) /n Bernoulli experiment, the probability of event 
a is p, and then in the n-fold Bernoulli experiment, the probability B(k; n, p) of a 
appearing k(0 < k < n) times is 


i (T ok nk a 
B(k;n,p)— ,]^ 4 ,q-—1- p. (1.25) 


Proof The results of the i-th Bernoulli test are recorded as x; (x; = a or a), then 
n-fold Bernoulli experiment forms the following joint event x 


X= XX2: Xn, Xj — a ora. 


Because of the independence of the experiment, when there are exactly k x; = a, the 
occurrence probability of x is 


p(x) = PDP) pwn) = pkg". 


Obviously, there are exactly k joint events of x; = a, and the total number is x; = a, 


SO 
n i 
B(k;n, p) = (Dra i 


Lemma 1.13 holds. 


In the same way, we can calculate the probability of event a appearing at the k-th 
in multiple Bernoulli experiments. 


Lemma 1.14 Suppose that a anda are two possible events in Bernoulli experiments, 
then the probability of the first appearance of a in the k-th Bernoulli experiment is 


na 5 


Proof Joint event x = x,x2---x, formed by k-fold Bernoulli experiment, where 
k — 1 x; =a, and x, = a, then 
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p(x) = p(x) --- pQa-0pQa) = pg. 


We have completed the proof. 


n-fold Bernoulli experiment is not only the most basic probability model in prob- 
ability and statistics, but also a common tool in communication field. Next, we take 
the error of binary channel transmission as an example to illustrate. 


Example 1.1 (Error probability of binary channel) In binary channel transmission, a 
codeword x of length n is a vector x = (x1, X2,..., Xn) in n-dimensional vector space 
I5, where x; = 0 or 1(1 <i < n). For convenience, let us write x = x1x2 :: Xn. 
Due to channel interference, characters 0 and 1 may have errors in transmission, 
that is, O0 becomes 1, 1 becomes 0, let the error probability be p (p may be very 
small), and the error probability of each transmission is constant. Under the above 
assumption, the codeword x with a transmission length of n can be regarded as a 
n-fold Bernoulli experiment, and the error probability B(k; n, p) of k(0 < k <n) 
errors of x in transmission is 


: —O (PY k nk = 
B(k; n, p) = J^ 4 q-—l-p. 


1.6 Chebyshev Inequality 


We call the variable £ defined as a real number in a probability space a random 
variable. For any real number x € (—0o, +00), p(x) is defined as the probability of 
the value x of the random variable £, i.e., 


p(x) = P{E = xj, (1.26) 


Call p(x) the probability function of £. If € has only a finite number of values, 
or countable infinite values, that is, the value space of & is a finite number of real 
numbers, or countable infinite real numbers, then £ is called discrete random variable; 
otherwise, & is called continuous random variable. The distribution function F (x) of 
a random variable & is defined as 


F(x) = P($ € x}, x e (—oo, +00). (1.27) 


Obviously, the distribution function F (x) of is defined as a monotone increasing 
function on the whole real axis (—oo, +00). And it is a right continuous function, 
that is F(xo) = lim " F(x). The probability distribution of a random variable & is 

X— Xo 


completely determined by its distribution function F (x), in fact, for any x, 


p(x) = P{E =x} = F(x) — Fix —0). 


18 1 Preparatory Knowledge 


Let f (x) be a nonnegative integrable function on the real axis. And 


x 


ro) f ro dt, (1.28) 


—0o0 


f (x) is called the density function of the random variable £. Obviously, the density 
function satisfies: 


+00 
f(x) = 0, Vx € (—oo, +00), f f(x) dx =1. (1.29) 


On the other hand, the function f(x) satisfying the formula (1.29) must be the den- 
sity function of a random variable. Here, we introduce several common continuous 
random variables and their probability distribution. 

1. Uniform distribution(Equal probability distribution) 

A random variable é is equal probability value in interval [a, b], and & is said to be 
uniformly distributed, or it is also called a random variable of uniformly distributed, 
and its density function is 


1 


———, a X x «€ b. 
fx)24b-—a 
0, otherwise. 
Its distribution function F (x) is 
0, when x « a. 
x—a 
F(x) = ; when a < x <b. 
b—a 
1; when x > b. 


2. Exponential distribution 
The density function of random variable & is 


Ae".  whenx > 0. 
0, when x < 0. 


TN 


where the given parameter is à > 0, and its distribution function is 


—eM, when x > 0. 


l 
F(x) = 


0, when x < 0. 


We call & an exponential distribution with parameter à or a random variable with 
exponential distribution. 
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3. Normal distribution 
A continuous random variable $ whose density function f(x) is defined as: 


fœ) = 


1 
V 200 
where u and o are constants, o > 0. We say that € obeys the normal distribution 


with parameters of u and o?, and denote as £ ~ N (n, o°). By Possion integral, 


+00 


| e^ dx = Vm, 


—0o0 


it is not hard to verify 


+00 
f f(x) dx = 1. 


The distribution function F(x) of normal distribution N (n, a?) is 


x 


vd 


—0o0 


. tw)? 
e ? dt. 


F(x) = 


When u = 0,0 = 1, N (0, 1) is called standard normal distribution. 

Let us define the mathematical expectation and variance of a random variable &. 
First, let us see the mathematical expectation of a discrete random variable . 

(1) Let & be a discrete random variable whose value space is (x1, X2,...,Xn,-.-}- 
And let p(x;) = P {E = xi). If 


Too 


3 xil pon) < oo, 


i=l 


Then the mathematical expectation E (£) of & is defined as 
+00 
E£ = E() = 9  xip(xi). (1.30) 
i=l 


(2) Let & be a continuous random variable and f(x) be its density function, if 


+00 
/ |x| f(x)dx < +00, 
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Then the mathematical expectation E(&) of & is defined as 


Too 


EE = E(E) = E (1.31) 


=00 


(3) Let h(x) be a real valued function, then A(&) is also a random variable, and 
h(&) is called a function of the random variable £. The mathematical expectation 


E(h(&)) of h(&) is En (€). 


Lemma 1.15 (/)Let& beadiscrete random variable whose value space is (x1, x2, . . 
Xn,---}, if E(&) exists, then E, (E) also exists, and 


+00 
En(&) = J h G) pai). 
i-l 


(2) If & is a continuous random variable, and E(&) exists, then Ej(&) also exists, 


and 
+00 


HOS / h(x) fOode. 


—0o0 


Proof Let the value space of n = h(&) be (yi, yo, ..., Yn, ---}, then 


Too Too 
P{n=y}=P( ) == Y, Pe =x). 
ese HGS: 
By the definition of E (n), then 
+00 
Ex) = E(n) = 3 yP {n= yj] 
j=l 


+00 
= Yo h(x) pi). 
i=l 


The same can be proved (2). 


D 
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The following basic properties of mathematical expectation are easy to prove. 


Lemma 1.16 (/) If & = c is constant, then E(E) = c. 
(2) If a and b are constants, then E(a& + b£) = aE(€)+ bBE(&). 
(3) Ifa <& < b, thena < E(&) x b. 


If the mathematical expectation E (€) of a random variable exists, then (£ — E £y 
is also a random variable (take h(x) = (x — a)*, where a = E(£)), We define the 
mathematical expectation E; (E) of h(E) as the variance of £, denoted as D(&), that 
is 

DE) = E(& — E£)). 
Denote o = ~y D(&) is the standard deviation of £. Here are some basic properties 
about variance. 


Lemma 1.17 (1) D(é) = E(&*) — E? (€) 
(2) If& = ais constant, then D(é) = 0. 
(3) DE +c) = DG). 

(4) D(c&) = c? DE). 
(5) If c + E&, then DE) < E((& — 0°). 


Proof (1) can be seen from the definition, 


D(é) = E((é — E&)*) 
= E(&* — 2EEE + E*(£)) 
= E(é’) — 2(E£Y! + (EG? 
= E(é*) — (EEY. 
(2) is trivial. Let us prove (3). By (1), 
Dé +c) = E((E - c) - (EE +0)!" 

= E(£ + 2cé +c?) — (EEY? + 2cE(E) + c?) 

= E(£) + 2cE(€) + c? — (EEY — cE (E) - ? 

= E(£) — (E&Y = Dé). 


(4) can also be derived directly from (1). In fact, 


D(c£) = E(c°&*) — (E (c£)? 
—-cE()?)-cEsy 
= c’D(é). 


To prove (5), from Lemma 1.16, we notice that the mathematical expectation of 
(E — E£) is 0, so if c # E(&), by (3), we have 
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DE) = DE - c) = E(&6 = o) - (EG - OY. 
Since the last term of the above formula is not zero, we always have 
DE) < EIE — cy). 


(5) holds. This property indicates that E((é — c)?) reaches the minimum value 
D(&) at c = E&. We have completed the proof. 


Now we give the main results of this section; in mathematics, itis called Chebyshev 
type inequality, which is essentially the so-called moment inequality, because the 
mathematical expectation E& of a random variable & is the first-order origin moment 
and the variance is the second-order moment. 


Theorem 1.1 Let h(x) be a nonnegative real valued function of x, & is a random 
variable, and expectation E& exists, then for any £ > 0, we have 


P{h(§) > £} < (1.32) 


E,(&) 
EC 
and 


P{h(&) >£} « 


EG (1.33) 
€ 


Proof We prove the theorem only for continuous random variable £. Let f(x) be 
density function of £, then by Lemma 1.15, 


+00 
E,(é) = J h(x) f(x) dx 
> J h(x) f (x) dx 
h(x)>e 
>E I f(x)dx 


h(x)>e 


= eP{h(x) > e}. 


so (1.32) holds. Similarly, we can prove (1.33). 

In the theorem, we can get different Chebyshev inequality by replacing h(€) with 
é — Eé. 
Corollary 1.1 (Chebyshev) /f the variance D(&) of the random variable & exists, 
then for any £ > 0, we have 


P{l§ — E| > £} < 


a (1.34) 
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Proof Take h(é) = (€ — E£)? in Theorem 1.1, then |£ — E£| > e if and only if 
h(é) > £?, from definition, En (€) = D(£&). thus 
E 
Pilg — El > e) = PRE) > e) < FO, 


The Corollary holds. 


Corollary 1.2 (Chebyshev) Suppose that both the expected value Eé and the vari- 
ance D(&) of the random variable & exist, then for any k > 0, we have 


1 
P(J& — E&| >k/DE)} < x (1.35) 


Proof Take e = ky D(&) in Corollary 1.1, then 


Dé) 1 


BD) Kk 


P(J& — E&| > k/D(5)) < 


Corollary 1.2 holds. 


In mathematics, u is often used as the expected value, o = y D(&)(o > 0) as the 
standard deviation, that is 


u = E$, o = VD), o 20. 


Then the Chebyshev inequality in the Corollary 1.2 can be written as follows: 
1 
PIE — u| 2 ko} < us. (1.36) 


Corollary 1.3 (Markov) Jf the expected value of the random variable & satisfying 
the positive integer |E|* of k > 1 exists, then 


k 
TEDES 


Proof Take h(E) = |&|* in Theorem 1.1, Replace £ with e*, then the Markov inequal- 
ity is directly derived from Theorem 1.1. 


Next, we introduce several common discrete random variables and their probability 
distribution and calculate their expected value and variance. 


Example 1.2 (Degenerate distribution) A random variable & takes a constant a with 
probability 1, that is £ = a, P(É = a} = 1, & is called degenerate distribution. From 
Lemma 1.16, (1), EE = a, its variance is D(é) = 0. 
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Example 1.3 (Two point distribution) A random variable € has only two values 
{x1, x2), and its probability distribution is 


P($ —xij—p,P($—x)j-l—-p.0cpc«l. 


& is called a two-point distribution with parameter p, and its mathematical expectation 


and variance are 
E(&) = xip + xx(1 — p), 


DE) = pl — p)Gi — x2)’. 


Specially, take x, — 1, x? — 0, then the expected value and variance of the two-point 
distribution are 


E(&) = p, D(6) = p(1— p). 


Example 1.4 (Equal probability distribution) Let a random variable £ have n values 
[x1, X2, ..., Xn} and be equal probability distribution, that is 


1 
P{§=x}=—,l <i <n. 
n 


€ is called a equal probability distribution or uniform distribution with obeying n 
points x1, x2, ... , Xn- The expected value and variance are 


Iz ix 
E(§) = — ) x, DE) = - 5 e — EO". 
i=1 i=1 


Example 1.5 (Binomial distribution) In the n-fold Bernoulli experiment, the number 
of times é of event a is a random variable from 0 to n. The probability distribution 
is (see Bernoulli experiment) 


Lg ; (P \ k n-k 
PE =k) =bn p) = (7) pha ; 


where 0 < k < n, p is the probability of event a occurring in each experiment. & is 
called a binomial distribution with parameter n, p, denotes as & ~ b(n, p). In fact, 
b(k; n, p) is the expansion of binomial (p + q)”. 


Lemma 1.18 Let & ~ b(n, p), then 


E(£) 2 np, D$) = npq, q—1— p. 
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Proof By definition, 


E(E) = 9 kb(k;n, p) = Yee 
k=0 


k=1 


"\(n-1 —d ned ce 
-wY( er 1g 0-07 4-0 
k=1 
n-l 
n—i nl 
-a5 ( : yrs - 


n-l 


—- np b(in— 1, p) 


k=0 
= np. 


Similarly, it can be calculated 


EE’) = M b(k;n, p) =n" p? + npq. 
k=0 


thus 


DE) = E(&^) — (EE)? = npq. 
We have completes the proof. 


Lemma 1.19 p, is the probability of event a in the n-fold Bernoulli experiment. If 


npn — A, then we have 
k 


. AC X 
lim b(k; n, pn) = —e *. 
noo k! 
Proof Write A, = npn, then 


b(k; n, pa) = (o (p — pr)" 


np eG 
l n 

An)* 1 Ec Anse 
as Xa yee (d gy E 


k! n n 


Because for fixed k, there is lim (A,)* = A*, and 
n—oo 


a 
lim (1 — =" = e, 
noo n 
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also 


So there are " 


À 
lim b(k; n, pa) = —e 
noo k! 
So Lemma 1.19 holds. 


Example 1.6 (Possion distribution) The value of discrete random variable & is 
0,1,...,n,..., A > 0 is a nonnegative real number, if the probability distribution 


of & is 
k 


AT a 
P{E =k} = p(k, 43) = ut 


€ is called a random variable which obeys Poisson distribution. It can be proved 
that the expected value and variance of Poisson distribution £ are A. When p is very 
small, the random variable &, of n-fold Bernoulli experiment can be considered to be 
close to the Poisson distribution &. In this case, the probability distribution function 
b(k; n, p) can be approximately replaced by the possion distribution, that is 

(npY uy 


Bn, p) 
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The so-called stochastic process is to consider the statistical characteristics of a 
consistent random variable (5;]7 .,. We can describe it as a n dimensional random 
vector. Let {&}/'_, be n compatible random variables of a given probability space, 


E = (£j, &,...,&,) is called an n-dimensional random vector with values in IR" in 
the probability space. 
A stochastic process or a n dimensional random vector € = (81, &,..., En) is 


uniquely determined by the occurrence probability of the following joint events. 
Let A(&;) C R be the value space of random variable &;(1 <i < n); then for 
any (x1, Xo, ..., Xn) € A(&1) x A(&2) x --- x A(E,) C R”, the probability of occur- 
rence of the following joint event is denoted as 


pGixa-:: Xn) = p((x1, xa, iti nn) = P(& = X1, &2 = 42, neg Gn =X}. 


Definition 1.7 If for any x; € R(1 <i < n), we have 


P(X1X2 +++ Xn) = pGa)pQo) --- Pn). 
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Called stochastic process {&;}/_, is statistically independent. 


Strictly speaking, each real number x; in the Definition 1.7 should belong to the 
set of Borel on the line to ensure the event (5; = x;) generated by &; is the event in 
a given probability space. 


Similarly, we can define a vector function F (x1, x2,..., Xn) in R” as 
F (x1, X5, ..., Xn) = P(& < x1, & Ss 0 En < Xn}, 
This is the distribution function of random vector $ = (51, &,..., En). Its marginal 


distribution function is 


Fy (xj) = P{& € xj] = F(400, +00,..., xi, +00,..., +00). 


For the following properties of stochastic process, we do not give any proof. The 
reader can find them in the classical probability theory textbook (see reference 1’s 
Rényi 1970, Li 2010, Long 2020). 


Lemma 1.20 (/) A stochastic process {&;}''_, is statistically independent if and only 
if 
F(x1X2+++Xn) = Foi)FGoo):-- Fn). 
(2) Suppose {&§;}?_, is statistically independent, for any real value function g; (x), 


then (gi (5i) ;., is also statistically independent. 
(3) If &; is n random variables, then 


EE: + £y t En) = EE) + Eo) + --- + EIS). 


(4) If {&;}?_, is statistically independent, the expected value E(&;) of each ran- 
dom variable existence, then the mathematical expectation of random variable 


E = (&, &,..., En) exists, and 


EŒ) = EE, &, ..., En)) = EEDE (2): +: EEn). 


Definition 1.8 Let (5;]7?, be a series of random variables, & is a given random 
variable, if for any € > 0, we have 


lim PLE, m £| > £j = 0, 
n— oo 
it is called (&,) converges to € in probability, denoted as &, mand E. 
Obviously, &, E & if and only if for any € > 0, there is 


lim P{|§&: — $| 2 ej — I. 
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If the occurrence probability of an event is p, the frequency of the event in the 
statistical test gradually approaches its probability p. Strict mathematical statements 
and proof are attributed to the Bernoulli law of large numbers. 


Theorem 1.2 (Bernoulli) Let 14, be the number of occurrences of event a in the n- 
fold Bernoulli experiment, it is known that the probability of occurrence of a in each 
experiment is p(0 « p < 1), then the frequency (n of a converges in probability 
to p, that is, for any € > 0, there is 


ime 


lim P{|— — p| > e} 2 0. 
n 


n—oo 


Proof Consider as a random variable, its expected value and variance are 


n 1 
EC) =E) = p 


and i 

Hn Pq 
D(—) = = D(a) = —, q=1-p. 
n n n 


respectively. By Chebyshev inequality (1.34), we have 


n pd 
P{|— — p| > e} < —. 
n ne 


For any given £ > 0, we have 


lim P(| * — p| > e) — 0. 


noo n 
So Bernoulli's law of large numbers holds. 


In order to better understand Bernoulli's law of large numbers, we can use a 
random process to describe it. Define 


In if event a occurs in the i-th experiment. 


0, if event a does not occur in the i-th experiment. 


Then 5$; follows a two-point distribution with parameter p (see Sect. 1.6, example 
1.3), and (ENT is an independent and identically distributed stochastic process. 
Obviously, 


Hs — &, EG) = p. 


isl 
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So Bernoulli’s law of large numbers can be rewritten as follows: 


1 n 1 n 
lim P ł |+ i— -E ; =1, 
lim P2 : Q 8l «| 


where (5j) is a sequence of independent random variables with the same two-point 
distribution of 0 — 1 with parameter p. It is not difficult to generalize this conclusion 
to a more general case. 


Theorem 1.3 (Chebyshev's law of large numbers) Let (ENT be a series of inde- 
pendent random variables, their expected value E (&;) and variance D(&;) exist, and 
the variance is bounded, i.e., D(&;) < C holds for any i > 1, then for any £ > 0, we 
have 


1 n 1 n 
lim P(|-5& - - EG « e) = 1. 
i=1 


i-l 


Proof By Chebyshev inequality, 


[x ilo 
P- 5 & — EO snl = e 
i=1 i=1 


1 n : 
< D m £i) 
| ODO LOU) 


n?g? 


1 n 
734 2,D6) 
isl 


C 
< 


T ng? 
So there are 


1 n 1 n 
Jim P(- 5 é- -» EG) z e) = 0. 
i=l 


i=1 
That is, Theorem 1.3 holds. 


Chebyshev’s law of large numbers is more general than Bernoulli’s law of large 
numbers, it can be understood as a sequence of independent random variables {&;}, 
the arithmetic mean of a random variable converges to the arithmetic mean of its 
expected value in probability. 

As a special case, we consider an independent identically distributed stochastic 
process {&;}. Because there is the same probability distribution, there is the same 
expectation and variance. 
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Corollary 1.4 Let (£j) be an independent and identically distributed random pro- 
cess, their common expectation is 1, the variance is o°, that is E(&;) = u, D(&) = 
c?(i —1,2,...), then we have 


ES 
lim P(I- 5 & - ul «e) = 1, 


i=1 


| P 
that is - Y 7 i — p. 


In the above Corollary, the existence of variance is unnecessary, Sinchin proved 
an independent and identically distributed stochastic process (5j), as long as the 
expected value E(&$;) = u exists. Then 1 Y i4 & converges to its expected value in 
probability. This conclusion is called Sinchin’s law of large numbers. 

Finally, we state the so-called Lindbergh Levy’s central limit theorem without 
proof. 


Theorem 1.4 (central limit theorem) Let (EM is an independent and identically 
distributed stochastic process, the expected value is E(&;) = p, the variance is 
D(&) =o? > 0G =1,2,...), then for any x, we have 


we =e fe 
lim P(S "P <x} = ^ | e- Fed, 
noo PE Jm 

—oo 


That is, the sum of random variables Y 77 ., £i, whose standardized variables converge 
to the standard normal distribution N (0, 1) in probability. 


Exercise 1 (Nie and Ding 2000) 


1. Let A, B, C be three nonempty sets, A -> B is the given mapping, Tı and T2 
are any two mappings of B — C. Prove: if o is surjective and tio = 120, then 
T = t5. 

2. Let t; and v» be any two mappings of A —> B,o is the given mapping of B > C. 
Prove: if ø is injective and oT; = o t2, then r; = T2. 

3. Let A —> Bbea injective, r : B — A is the left inverse of c, Is the left inverse 
t of o unique? 

4. Let A —» B bea surjective, Is the right inverse of o unique? 

5. Suppose that a, m, n are integers, a > 0, m > 1,n > 1, prove 


a” pla +1)=10r2. 


Thus prove Polya theorem: there are infinitely many primes. 
6. On the positive integer set, the Möbius function u(n) is defined as 
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10. 


11. 


12. 
13. 
14. 


1, when n = 1, 
An) = 10, when n contain square factor, 


(—1)', when n = pip»: -- Pr, pare different primes. 


Prove Móbius identity 


y d 1; when n = 1, 
ER 3i B 0, when n > 1. 


. Suppose (n) is a Euler function, prove 


g(n)=n pu» 


d|n 


. Letn > 1 be positive integer, prove Wilson theorem: 


(n — 1)! + 1 = 0(mod n) 


if and only if n is prime. 


. Let n and b be positive integers, n > b, prove n can be uniquely expressed as 


the following b-ary number: 

n = bg + bib + bob? +- -- + b, ub! where 0 < b; < b,r > 1. 
n = (b,_1b;-2--- bbq), is called the b-ary expression of n and r is called the 
b-ary digit of n. 


Let f (n) be a complex valued function on a set of positive integers, and prove 
the inversion formula of Mobius: 


F(n) 2 f(d), vn z 1e fa) — Luar), Vn z 1. 


d|n d|n 


Prove that the following sum formula: 


]xrzn 
(r,n)=1 


Prove: There are infinitely many primes p satisfies p = — 1 (mod 6). 

Solve the congruence equation: 27x = 25(mod 31). 

Let p be a prime, n > 1 be a positive integer, find the number of solutions of 
quadratic congruence equation x? = 1 (mod p”). 
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15. In order to reduce the number of games, 20 teams were divided into two groups, 
each with 10 team, find the probability that the strongest two teams will be in the 
same group, and the probability of the strongest two teams in different groups. 

16. (Banach question). A mathematician has two boxes of matches. Each box has N 
matches. When he uses them, he takes one match from any box and calculates 
the probability that one box has k Matches and the other box is empty. 

17. A stick of length / can break at any two points, find the probability that the three 
pieces of the stick can form a triangle. 

18. There are k jars, each containing n balls, numbered from 1 to n. Now take any 
ball from each jar and ask the probability of that m is the largest number in the 
ball. 

19. Take any three of the five numbers of 1, 2, 3, 4, 5 and arrange them from small to 
large. Let X denote the number in the middle and find the probability distribution 
of X. 

20. Let F(x) be a distribution function of a continuous random variable, a > 0, 


prove 
Too 


f |F (x +a) — F(x)|dx =a. 


—0o0 


21. (Generalization of Bernoulli's law of large numbers) Let jz, is the number of 
occurrences of event A in the first n experiments of a series of independent 
Bernoulli experiments, it is known that the probability of occurrence of event A 
in the i test is p;, try to write the corresponding law of large numbers and prove 
it. 
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Chapter 2 A) 
The Basis of Code Theory get 


The channel of information transmission is called channel for short. The commonly 
used channels include cable, optical fiber, medium of radio wave transmission and 
carrier line, etc., and also include tape, optical disk, etc. The channel constitutes 
the physical conditions for social information to interact across space and time. In 
addition, a piece of social information, such as various language information, picture 
information, data information and so on, should be exchanged across time and space, 
information coding is the basic technical means. What is information coding? In 
short, it is the process of digitizing all kinds of social information. Digitization 
is not a simple digital substitution of social information, but is full of profound 
mathematical principles and beautiful mathematical technology. For example, the 
source code used for data compression and storage uses the principle of probability 
statistics to attach the required statistical characteristics to social information, so the 
source code is also called random code. The other is the so-called channel coding, 
which is used to overcome the channel interference. This kind of code is full of 
beautiful algebra, geometry and various mathematical techniques in combinatorics, 
in order to improve the accuracy of information transmission, so the channel coding 
is also called algebraic combinatorial code. The main purpose of this chapter is to 
introduce the basic knowledge of code theory for channel coding. Source coding will 
be introduced in Chap. 3. 

With the hardware support of channel and the software technology of information 
coding, we can implement the long-distance exchange of various social information 
across time and space. Taking channel coding as an example, this process can be 
described as the following diagram (Fig. 2.1). 

In 1948, American mathematician Shannon published his pioneering paper 
“Mathematical Principles of Communication” in the technical bulletin of Bell labora- 
tory, marking the advent of the era of electronic information. In this paper, Shannon 
proved the existence of “good code” with the rate infinitely close to the channel 
capacity and the transmission error probability arbitrarily small by using probabil- 
ity theory (see Theorem in this Chap. 2.10), on the other hand, if the transmission 
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| Channel coding Channel 


decoding 
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transmission 


Fig. 2.1 Channel Coding 


error probability is arbitrarily small, the code rate (transmission efficiency) does not 
exceed an upper bound (channel capacity) (see Theorem in Chap. 3). This upper 
bound is called Shannon's limit, which is regarded as the golden rule in the field of 
electronic communication engineering technology. 

Shannon's theorem is an existence proof rather than a constructive proof. How 
to construct the so-called good code which can not only ensure the communication 
efficiency (the code rate is as large as possible), but also control the transmission error 
rate is the unremitting goal after the advent of Shannon's theory. From Hamming and 
Golay to Elias, Goppa, Berrou and Turkish mathematician Arikan, from Hamming 
code, Golay code to convolutional code, turbo code to polar code, over the past 
decades, electronic communication has reached one peak after another, creating one 
technological miracle after another, until today's 5G era. In 1969, the U.S. Mars probe 
used Hadamard code to transmit image information. For the first time, mankind was 
lucky to witness one beautiful picture after another in outer space, in 1971, the U.S. 
Jupiter and Saturn probe used the famous Golay code G23 to send hundreds of frames 
of color photos of Jupiter and Saturn back to earth, 70 years of exploration of channel 
coding is a magnificent history of electronic communication. 

The main purpose of this chapter is to strictly define and prove the mathematical 
characteristics of general codes in theory, so as to provide a solid mathematical 
foundation for further study of coding technology and cryptography. This chapter 
includes Hamming distance, Lee distance, linear code, some typical good codes, 
MacWilliams theorem and famous Shannon coding theorem. Master the content of 
this chapter, we will have a basic and comprehensive understanding of channel coding 
theory (error correction code). 


2.1 Hamming Distance 


In channel coding, the alphabet usually chooses a q-element finite field F; , sometimes 
a ring Zm, where q is the power of a prime. Let n > 1 be a positive integer, F is an 
n-dimensional linear space over F}, also called codeword space. 


F3 = {x = (x1, x»... Xn)|Yx; € Fo}. 


2.1 Hamming Distance 37 


A vector x = (x1, X2, ..., Xn) in F3 is called a codeword of length n. For conve- 
nience, a codeword x, we write as x = x1x2 . . . Xn, each x; € Fy is called a character, 
denoted by 0 = (0,0,..., 0). 

Two codewords x = x1X2...X, and y = y1y2. .. Yn define the number of char- 
acters whose Hamming distance is different from x and y, that is 


d(x, y) = 8ill < i < n, x; Æ yi} (2.1) 


Obviously 0 € d(x, y) < nis a positive integer, the weight function of a codeword 
xE E is defined as w(x) = d (x, 0), that is Hamming distance between x and 0. The 
following properties are obvious. 


Property 2.1 Ifx, y € F5, then 


(i) d(x, y) x 0, d(x, y) = 0 if and only if x = y. 
(ti) d(x, y) = d(y, x). 
(iii) w(—x) = w(x). 
(iv) dx, y) -2d(x —z y-zhVvze Es. 
(v) d(x, y) = wx — y). 


Property (i) is called nonnegativity, property (ii) symmetry and property (iv) trans- 
lation invariance. This is the basic property of distance function in mathematics, and 
we can analogy with the distance between two points in plane or Euclidean space. 


Lemma 2.1 Let x, y € F} be two codings, then 


w(x + y) € w(x) + wy). 


Proof Because w(—x) = w(x), so w(x — y) = w(x + (—y)). We can only prove 
w(x + y) € w(x) + w(y). Let x = x1 ... Xn, Y = yi... Yn, then 


x+y = i + yi)(%2 + y2) -Xn + Yn). 


Obviously, if x; + y; Æ 0, then x; Æ 0, or y; AOU <i <n). Thus w(x + y) € 
w(x) + w(y). 


w(x — y) = w(x + (-y)) < w(x) + w(-y) = w(x) + wy). 
We have completed the proof. 


Lemma 2.2 (Trigonometric inequality) /f x, y, z € Fj are three codings, then 
d(x, y) € d(x,z) + d(z, y). 
Proof From 2.1, if z € ee then 


w(x — y) x w(x —z) + w(z — y). 
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Then by property (v), d(x, y) = w(x — y), we have 
d(x, y) S d(x, z) + d(z, y). 


The Lemma holds. 


The nonnegativity, symmetry, translation invariance of Hamming distance and 
trigonometric inequality described in lemma 2 together show that Hamming distance 
of two codewords is equal to the distance between two points in physical space, which 
is a real distance function in mathematical sense. Similarly, we can define the concept 
of ball. A Hamming sphere with radius p centered on codeword x is defined as 


B,(x) = {yly € Ez. d(x, y) & p}, (2.2) 


where p is a nonnegative integer. Obviously, Bo(x) = {x} contains only one code- 
word. 


Lemma 2.3 For any x € Fy 0< p & n, we have 
E E i 
|Bo(x)| = Y ()«- Ds Q.3) 
i=0 
where |B,(x)| is the number of codewords in Hamming ball B,(x). 
Proof Let x = xix2...x,, 0 <i € p, i given, let 


Aj = fiy € Fi ld(y, x) = i}. 


n . 
Aj = ())« eub. 


Obviously, 


So 


Corollary 2.1 For Vx € [in we have 
|B,(x)| = |B5(0)]. 


That is to say, the number of codewords in B,(x) is a constant which only depends 
on radius p. This constant is usually denoted as By. 
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Definition 2.1 If C S Fy» C is called a q-ary code, code for short, |C | is the number 
of codewords in code C. If |C| = 1, we call C a trivial code, and all the codes we 
discuss are nontrivial codes. 


For acode of C, the following five mathematical quantities are of basic importance. 


Definition 2.2 If C is a code, define 


Bit rate of C R = Rc = L log, IC| 

Minimum distance of C d= min(d(x, y)|x, y e C. x Æ y} 

Minimal weight of C w = min(w(x)|x € C, x z 0} 

Coverage radius of C p = max(min(d(x, c)|c e C]|x € Fi} 

Disjoint radius of C o; = max{r|0 <r <n, B,(ci)  B,(c2) = 6, Vei, 2 € 
C. c FC} 


It is important to discuss the relationship between the above five mathematical 
quantities for the study of codes. We begin by proving lemma 2.4. 


Lemma 2.4 Let d be minimum distance of C, pi be disjoint radius of C, then 
d=2p +1, d=2p 4 2. 


Proof We can only prove 2p; + 1 < d € 2p, + 2. If d < 2p, then there are code- 
words c; € C, c2 € C, cy Æ c» such that 


d(c1, c2) € 21. 


This means that cı and cz have at most 2p; different characters. Without losing 
generality, we can make the first 20, characters of c, and c» different, that is 


Cy = 44102...0p, 0p, 41... 2p, * PR t t * 
c2 = bibo... bo Do, 41 «+ Dog, KOK OK 


where * represents the same character. We can put 
X = ajl. ap bo, 41.-- bap, rof, 


this shows that 
d(x,ci) S pi, d(x, c3) S pi. 


That is 
x € Ba (c1) N Bp (c2). 


It's in contradiction with B, (c1) N B5, (c2) = $. So we have d > 29, + 1. If d > 
20; + 2 = 2(p; + 1), then we can prove the following formula, which is in contra- 
diction with the definition of disjoint radius p1. 
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Bo sie) N Bosi(c2) = Q, Vei, 02 € C, e ACD. 


Because if the above formula does not hold, then c1, cz € C, c1 Æ c», By 41(€1) 
intersects with B,,+1(c2), we might as well make 


x € By yiler) O By 41 (c2), 


Then the trigonometric inequality of lemma 2.2 is derived 

d(ci, c2) S d(ci, x) + d(c2, x) < 2(p1 + 1), 
It contradicts the hypothesis of d > 2(p; + 1). So we have 2p; + 1 < d € 29; +2. 
The Lemma holds. 


In order to discuss the geometric meaning of covering radius p, we consider the 
set {B,(c)|c € C} of balls on code C, if 


U 2,(0 = F}, 
ceC 


Then (B, (c)|c € C} is called a cover of codeword space Fi. 


Lemma 2.5 Let p be the covering radius of C, then p is the smallest positive integer 
of {Bo (c)|c € C] covering E 


Proof By the definition of p, for all x € F7, there is 
min(d (x, c)|c € C) € p. 


Therefore, when x € E, is given, there is a codeword c € C => d(x, c) < p, that is 
x € B,(c), this shows that 


|] Bo =F. 


ceC 


That is, (B5 (c)|c € C} forms a cover of i. Obviously, (B5 .1(c)|c € C] can't cover 
m. because if 


U 3,10 =F. 


ceC 


Then for any x € F7, 3 c € C 2 x € B, 4(c), so 
min(d(x, c)|c e C) € p — I. 


Thus 
p = max(min(d(x, c)|c e C]|x € E] X p — l. 


The contradiction indicates that p is the smallest positive integer. The lemma holds. 
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Lemma 2.6 Let d be the minimum distance of C and p be the covering radius of C, 
then 
d € 2p 4 1. 


Proof lf d > 2p + 1, Let co € C be given, then we have 
B,41(co) N B5(c) =, Vc € C, c x co. 
So you can choose x € By+1(co), and d(x, co) = p + 1, then 
x € B,(co), x ¢ B5(c), Vc e C. 


That is, {B,(c)|c € C] cannot cover F”, which is contrary to lemma 2.5. So we 
always have d < 2p + 1. The Lemma holds. 


Combining the above three lemmas, we can get the following simple but very impor- 
tant corollaries. 


Corollary 2.2 Let C C F} be an arbitrary q-ary code. d, p, py are the minimum 
distance, covering radius and disjoint radius of C respectively, then 


(i) pi € p. 
(ii) If the minimum distance of C is d = 2e + 1 > e = p. 


Proof (i) Directly from 2p; + 1 < d < 2p + 1, if d = 2e + 1 is odd, then by the 
lemma 2.4, d = 29, + 1 = 2e + 1 > e = p. 


Definition 2.3 A code C, if o = pj, is called a perfect code. 


Corollary 2.3 (i) The minimum distance of any perfect code C is d = 2p + 1. 
(ii) The minimum distance of a code C is d = 2e + 1, Then C is a perfect code if 
and only if V x € In 3 the only ball B,(c), c € C > x € B,(c). 


Proof (i) can be directly launched by 2p; + 1 < d < 2p + 1. To prove (ii), if C is 
a perfect code and the minimum distance is d = 2e + 1, so we have p; = p = e. 
On the other hand, if the conditions are right, then the coverage radius of C is 
p Se = pi € p, s0 p, = p. C is a perfect code. 


In order to introduce the concept of error correcting code, we discuss the so- 
called decoding principle in electronic information transmission. This principle is 
commonly known as the decoding principle of “look most like". What looks like 
the most? When we transmit through the channel with interference, we receive a 
codeword x' € Fg» and a codeword x € C. If 


d(x, x’) = min{d(c, x’)|c € C}, 


x is the most similar codeword to x’ in code C. So we decode x’ to x. If the most 
similar codeword x is the only one in C, then theoretically, x’ is the codeword received 
after x transmission, so x’ 5 is accurate. 
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Definition 2.4 A code C is called e-error correcting code (e > 1). If for any x € F7, 
there is ac € C = x € B,(c), then c is unique. 


An error correcting code allows transmission errors without affecting correct 
decoding. For example, suppose that C is a e-error correcting code, then for any 
c € C, after c is transmitted through the channel with interference, the codeword we 
receive is x, if an error occurs when c is transmitted with no more than e characters 
at most, that is d(c, x) € e, so the most similar codeword in C must be c, so we can 


decode 
decode x ——> c correctly. 


Corollary 2.4 A perfect code with minimal distance d = 2e + 1 is e-error correct- 
ing code. 


Proof Because the disjoint radius p; of C has p; = p = e with the covering radius 
p. Therefore, for any received codeword x € E there exists and only exists a c € 
C => x € B,(c). That is, C is e-error correction code. 


Finally, we prove the main conclusion of this section. 


Theorem 2.1 The minimum distance of a code C is d = 2e + 1, then C is a perfect 
code if and only if the following sphere-packing condition holds. 


ei h)a- D =q". (24) 
i=0 


Proof If the minimum distance of C is d = 2e + 1, and C is the perfect code > 
p = pı = e. So 

U Bo) =F. 

ceC 


Then we have 


n 
=q, 


U B.(c) 


ceC 


thus 


CI, = ICI Y? ())« - y =q". 
i=0 


Conversely, the sphere-packing condition (2.4) holds. Because the minimum distance 
of C is d = 2e + 1, from corollary 2.2, we can see that o, = e, so we have 


U Bo) = Fy. 
ceC 


It can be concluded that o < e = pı € p, thus o = pi, C is a perfect code. The 
theorem holds. 
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When q = 2, the alphabet F; is a finite field of two elements {0, 1}, at this time, 
the coding is called binary code or binary code, and the transmission channel is called 
binary channel. In binary channel transmission, the most important is binary entropy 
function H(A), define as 


0, when à = OorA = 1, 


—Alog 4 — (1 — à) log(1 — à), when 0 <A < 1. (2.5) 


HA) = | 

: 1 . 1 

Obviously, H(A) = H(1 — à), and 0 < HO) 1, H 2 = 1, that is à = 2 
reaching the maximum. For further properties of H(A), please refer to Chap. 1. 

Theorem 2.2 Let C be a perfect code with minimal distance d = 2e + 1, Rc is the 


code rate of C, then 
e n e 
2 D) <H (5). 


1 
(i) 1— Rc = — log; 
n Fm n 


(ii) When the length of codeword is n — œ, if lim Rc = a, then 
n—oo 


Proof (i) According to the sphere-packing condition, since C is the perfect code, so 


ICI > (") — 2", 


We have 


1 1 (n 
-1 C -1 =]; 
z 1082 | estes (") 


i=0 
That is 


1 fn e 
I- Re = zie (7) <u ($), 
i=0 


The last inequality is derived from lemma 1.11 in the Chap. 1, so (i) holds. If there 
is a limit of Rc when n — ox, again from lemma 1.11 in the Chap. 1, we have 


lim H(=)=1- lim Re =1—a. 


n— oo n noo 
The Theorem 2.2 holds. 


Finally, we give an example of perfect code. 


Example 2.1 Let n = 2e + 1 is an odd number, then the repeated code in 5 is 
A = (0, 1}, where 0 = 00...0, 1 = 11---1 are Perfect codes of length n. 
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First, repeat code A = (0, 1} C F} contains only two codes 0 = 0...0 e F5,1— 
1...1 € F}, because n = 2e + 1 is an odd number, so from Corollary 2.2, Disjoint 
radius of A is p, = e, let's prove that the covering radius of A is p = o, = e, for 
any x € IF5, if d(0, x) > e, that is d(0, x) > e+ 1, this shows that at least e + 1 
characters are 1 in x = x1x2... Xn € F} , the maximum number of e characters is 0, 
thus d (1, x) < e. This shows that x ¢ B,(0), then x € B,(1), that is 


B.(0) U B,(1) = F5, 


SO p < e = pı < p, we have p = pı. That is A is the perfect code. Note that in this 
example, e can take any positive integer, so the code rate of the repeat code has a 
limit value à 

lim R4 20,2 lim A(-)=1. 

n—oo noo n 

As the end of this chapter, we discuss and define the equivalence of two codes. 

Let C C F} be a code of length n and S, be a permutation group of n elements. Any 
o € Sn is an permutation, x = x1x2... X4 € I. We define o (x) as 


o (x) = Xo(1)Xa(2) - - - Xo(n) € [m (2.6) 
e(C) = lo(c) | c e C]. (2.7) 


Definition 2.5 Let C and C, be two codes in E if there iso € $, > o(C) = 
Cı, Call C and C, is equivalent, denoted as C ~ Cı. Obviously, equivalence is 
an equivalence relation between codes, because take o = 1, then have C ~ C. If 
C ~ C,,thatis Cj = o (C), then we have C = o7! (Ci), thatisC ~ C4 > C47 C. 
Similarly, if C ~ C1, C, ~ C2, then C ~ C5. Because C; = o (C), C2 = t (C1) > 
C2 = ro (C). Another obvious property is that the function of o does not change the 
Hamming distance between two codewords, that is, we have 


d(o (x), o (y)) = d(x, y), Vo € Sy. (2.8) 


Lemma 2.7 Suppose C ~ Ciare two equivalent codes, then they have the same code 
rate, the same minimum distance, the same coverage radius and the same disjoint 
radius. In particular, if C is a perfect code, then all codes C, equivalent to C are 
perfect codes. 


Proof All the results of lemma can be easily proved by using equation (2.8). 


2.2 Linear Code 


Let C C E; be a code, if C is a k-dimensional linear subspace of Fg» C is called a 
linear code, denote as C = [n, k]. So for a linear code C, we have 
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ko co "E ; 
Rc = — log, |C| = —, minimal distance d = minimal weight w. 
n n 


Let (0, @2,...,a,%} C C be a set of bases of linear code C, where 


Qj = Qi Qi, +++ Aj, eF; l<i<k. 


Definition 2.6 If {œ1, a2,...,a,} is a set of bases of linear code C = [n, k], then 
have k x n-order matrix 


œ11 O12 Qin 

a 
G= lá _ | 921 @22 +++ Man 
a Cy O2 +++ Akn 


Called generation matrix of C, write 
G = [k, Akx(n-k)l, Ik is k order identity matrix. 


It is called the standard form of G. 


Lemma 2.8 C = [n, k] is a linear code, G is generation matrix, then 


C = (aG|a € F$}. 


Proof Because (o, 05, ..., Œg} is a set of bases of linear code C. V x € C, then 
a} 
X = a104 + 22 + +++ + aak = (a1, 42,...,a%) | : | =a-G. 
Ak 
Where a = (a1, a2,..., ak) € Fi, the Lemma holds. 


Define the inner product in I. X-—Xp...Xg.y — yr... Yn € [s then define < 
x, y >= J; xiyi, if < x, y >= 0, Say x and y orthogonal, denote as x L y. 
Definition 2.7 Let C = [n, k] bea linear code whose orthogonal complement space 
C+ is 
C+ = {y €F"| < x, y >=0,Y x € C}. 


Obviously, C L is an [n,n — k]-linear code, and C+ is the dual code of C. The 
generating matrix H of C+ is called the check matrix of C. 


Lemma 2.9 C = [n, k] isa linear code, H is a check matrix, then we have 
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xH'-0execC. 


Where H' is the transpose matrix of H. 


Proof We only prove the conclusion by taking the standard form of the generating 
matrix G of C. Let 


G = [Is, Arx] = Uk, Al, A = Akxn—n)- 


Then the check matrix of C, that is, the generating matrix of dual code C+ is 


oe eae HE P4 
Ink 


By Lemma 2.8, if x € C, then 3 a € Fi = x = aG, thus 
Lk 


xH' =aGH' = a| I, A] | -A | = 0. 


Conversely, if xH’ = 0, because H is the generating matrix of C+, again by 
Lemma 2.8, for V y e C^, 3b € FL — y — bH, thus 


« x, y >= xy! 2 xH'b 20 x e (C+) =C. 
The Lemma holds. 
By Lemma 2.9, V x, y € F7, then 
xH'—-yH'e&x-yecC. 


Because C is an additive subgroup of F7, x H ' is called the check value of codeword 
x. Then the check values of the two codewords are equal 4>. These two codewords 
are in the same additive coset of C. The following decoding principle of linear code 
is produced. 

Decoding principle: If the C = [n, k] linear code is used for coding, through an 
interference channel, when the received codeword is x € Fg» then find a codeword 
xo With the least weight in the additive coset x + C of x, that is, xo satisfies 


Xo € x + C, and w(xo) = min(w(o)|o € x + C]. 


Xo is called the leader codeword in coset x + C. We're going to decode x into x — xo. 


Lemma 2.10 /fthe minimum distance of linear code C = [n, k] isd = 2e + 1, then 
there is at most one codeword xo = w(xo) < e in any additive coset x + C of C. 
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Proof If o, 8 € x 4- C, and w(a) x e, w(B) x e. Then œ — B € C. And w(a — 
P) € w(a) + w(8) = 2e, but minimal weight of C =Minimal distance of C = 
2e + 1, so there are contradictions, thus a = 8. The Lemma holds. 


Corollary 2.5 For a perfect linear code C = [n, k] with minimal distance d = 2e + 
1, then there exists and only exists a codeword with weight « e in any additive coset 
x + C of C. In other words, the leader code in any addition set is unique. 


Proof x € F3 — J c € C such that x € B,(c), that is d(c, x) < e. So w(x — c) < 
e.But x — c € x + C. The Lemma holds. 


Definition 2.8 If any two column vectors of the generator matrix G of a linear code 
C = [n, k] are linearly independent, C is called a projective code. 


In order to discuss the true meaning of projective codes, we consider the (k — 1)- 
dimensional projective space PG(k — 1, q) over F4. 

In y any two vectors a = (a1, d2,..., ag), b = (bi, bo, ..., be), say a ~ b, if 
3A € F} = a = Ab. This is an equivalent relation on Fi. Obviously b ~ 0 & b = 0, 
anya € Ft, a= (Aa|A € F^ the quotient set Wt f... is called a (k — 1)-dimensional 
projective space over F}. Denote as PG(k — 1, q), therefore 


PG(k —1,q) =Fi/~ = (ala € Fi}. 


The number of nonzero points in (k — 1)-dimensional projective space PG(k — 1, q) 
is 
k—-] 
= = k-1 
|PG(k — 1, q)| = =I =l+gqt---+q*". 


A linear code [n,n — k], its check matrix H is a k x n-order matrix, and any 
two column vectors are linearly independent, that is H = [aj, a5, ..., an], then 
(21,02, ..., an} C PG(k — 1, q) are n with different nonzeros. So the generating 
matrix of an [n, k] projective code consists of n different nonzero points in projec- 
tive space PG(k — 1, q). Because n < |PG(k — 1, q)|, when the maximum value 
is reached, i.e. 

k 
n=|PG(k—1,q)| = 4. 
q—1 


This leads to a perfect example of linear codes, called Hamming codes. 


1 
ming code if any two column vectors of the check matrix H of C are linearly inde- 


pendent. 


Definition 2.9 Let k > 1,n = i, a linear code C = [n, n — k] is called a Ham- 


Since C is an — k-dimensional linear subspace and C+ is a k-dimensional linear 
subspace, its generating matrix H is a k x n-order matrix. Therefore, if any two 
column vectors of H are linearly independent, they represent n different points in 
projective space PG(k — 1, q). Because n = 24, then the construction of Ham- 
ming codes is the most possible. 
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Theorem 2.3 Any Hamming code C = [n,n — k] is a perfect code, its minimum 
distance is d = 3; therefore, Hamming codes are perfect l—error correcting codes. 


Proof We first prove that the minimum distance of Hamming code C is d > 3. If 
d < 2,thereis x = x1x2... Xn => w(x) < 2, that is, there are at most two characters 
x; and x; are not 0. Because the minimum distance d = minimum weight w of a 
linear code. 

Let H = (aj, Q2,..., Œn) be the check matrix of C. if x H’ = 0, then 


a) 
0o» 
(X1, X2, ... X0) . =0. 


Qn 


We have a,x; +aj;x; = 0, thus o; and a; are linearly related, contradiction. So 
d > 3, by Lemma 2.4, then the disjoint radius of C is p; > 1. 
On the other hand, c € C, by Lemma 2.3, the number of elements in ball B; (c) is 


|Bi(c)) 2 19- n(q — D = q*. 


Because C is a (n — k)-dimensional linear subspace, that is |C| = q"-*, so 


IU BOI ele? = q" = 1, 


ceC 


25 (0 = F}. 


ceC 


We have 1 < pı < p < 1 > pı = p = 1. C isa perfect code. Its minimal distance 
is d = 2p + 1 = 3, the Lemma holds. 


Next, we discuss the weight polynomial of a linear code C and prove the famous 
MacWilliams theorem. 

x € C = [n, k], then the value of weight function w(x) is from 0 to n, actually 
w(x) =08 x =0EC, w(x)=n Ox =x... Xn, V x; Æ 0. So for each i, 0 < 
i < n, define 

A(i) = #{x € C|w(x) 2 i) 


and weighted polynomials of C. 


A(z) = > Ajz', zis a variable. 
i=0 


Obviously, for any given c € C, then the number of codewords in C whose Hamming 
distance to c is exactly equal to i is A;, that is 
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Aj = #{x e C|d(x, c) = i}. 


The codes with the above properties are called distance invariant codes; obviously, 
linear codes are distance invariant codes. 

The following result was proved by MacWilliams in 1963; he established the 
relationship between the weight polynomials of a linear code C and its dual code 
C+, which is the most basic achievement in code theory. 


Theorem 2.4 (MacWilliams) Let C = [n, k] bea linear code over F, and the weight 
polynomial be A(z), C+ is the dual code of C, the weight polynomial is B(z), then 


_ „—k _ n l—z 
Ba) =q "(1 + (4 — Dz) AGF G- z D2" 


Specially, when q = 2, 
2! B(z) = (1+ 2" AC 


1—z 
l+z ) 
Proof Let y (a) be an additive feature on F}. y (a) can be constructed as follows: 


v) ep I, TOS 


For any c € C, we define the polynomial g,(z) as 


Belz) = Doe? Y(<x,c >), (2.9) 


xeF; 
therefore, 


Yges ye) yE a,c >), (2.10) 


ceC xe ceC 
Let's calculate the inner sum of (2.10). If x € C+, then 


Shea) =e), 


ceC 


If x ¢ C+, let's prove 
e xc2)20. Q.11) 


ceC 
If x € E^, x ¢ C+, let 


T(x) iy eC |< y x >=O0} ÇC, 
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so T (x) is a linear subspace of C. c € C, we consider additive cosets any two code- 
words c + y; and c + y2 in this set , we have 


< C+ y1, X >=< C + y2, X >=< C, X >. 


On the contrary, any two additive cosets cı + T(x), c2 + T(x), if < cji, x >=< 
C2, x >,then < cı — c2, x >= 0, thatis cı — co € T (x), soci + T(x) = co + T(x). 
Therefore, the inner product of any two codewords in c + T(x) C C is the same 
with x. Conversely, different additive cosets and the inner product of x are not equal. 
Because x ¢ C+, 3 co € C, such that < co, x >Æ 0, let < co, x >= a z 0, then 
«a cy, x >= 1, letci = a™!co € C, then < cj, x >= 1. Therefore, Va € F} >< 
acı, X >= a. < C, x > takes every element of F,, so 


b* V(-x,c»)-2[C:T(x)] px V (a) — 0. 


ceC ack, 


That is, (2.11) holds. From (2.10), we can get that 


Xg) = Cl > 29 = CB). (2.12) 
ceC xeF, 
xeCt 


Define the weight function w(a) = 1 fora € F,,ifa 4 0, w(0) = 0. For any x € IF7, 
c € C, write x = x1x2... Xn, € = C1C2...Cy, then it is defined by G, we have 


gez) = > zm Ga) ew Qs e Ce x4 tee + CnXn) 


]xixn 
xi €, 


=T] ova). 


i=1 xeF, 


(2.13) 


The inner layer sum of the above formula can be calculated as 


l—z, ifc £0, 
w(x) . I 
2 OW erx) = 1- (q — Dz, ifc; — 0. 


xeF, 


From (2.13), then we have 
g) = (0. — 2"? + — 22)”. 


Thus 


2.2 Linear Code 51 


Yo gels) = 0 (- Vz)" Mee p 


ceC 
sülg- je 5 3 
= d AN MEUE are Tos ^ 
Finally, from (2.12), we have 
Bic qu) Y AGLI) 
Vg "we qp qe 
—k 1—2z 
=q (14 (q — Wz)" AU EG 


We have completed the proof of the theorem. 


2.3 Lee Distance 


m > lisapositive integer, Z,, ais residue class rings of mod m, if Z, is the alphabet 
and C C Z», is the proper subset, then C is called an m-ary code. In this case, 
Hamming distance is not the best tool to measure error, we substitute Lee distance 
and Lee weight. Let i € Z,,, define Lee weight as 


W;(i) = min(i, m — i}. (2.14) 

Obviously, 
Wz (0) = 0, WL(—i) = Wr(m — i) = Wz (i). (2.15) 
Suppose a = (a1, a2, ..., An) = a102 . . . An € Zh, b = bi by... b, € Zh, define Lee 


weight and Lee distance on m-ary code C as follows 
Wz (a) = x Wz (ai) 
d; (a, b) = Wila — b). 
From (2.15), we have 
Wy; (—a) = Wr (a), di (a, b) = di (b,a), V a,b € Zh. 
Lemma 2.11 For V a, b, c € Z^, we have the following trigonometric inequalities 


d; (a, b) < dr (a, c) + dr (c, b). 
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Proof Suppose 0 <i <m,0< j < m, we have 
WrG + j) S WrG) + WL). (2.16) 
Because 0 € i + j < 5, then 
WrG-cj)-icj-WLG)-TWiG). 


If A <i + j «m, we discuss it in three ways, 
(D i < $,j < 5, there is 


2? 2? 
WiG+j)=m—i-j <i+j=W.iG)+ WiC). 
(2) i < ž, j > 3, there is 
WL + jf) =m—i-j<m—j= Wj) < Wii) + WLG). 
(3) i> 3, jJ < F, there is 


WrG-c-j)—m-—i-jzm-—i-Wry(i) < WrLG) + We). 


So we always have (2.16), in Z7, (2.16) can be extended to 


m? 


W, (a+ b) x Wr(a)--Wg(b,Va,be Zz” 


m* 


So 
d; (a, b) = Wi (a — b) = Wz ((a — c) + (c — b)) 
< Wi(a—c)+ Wi(c — b) = di(a,c) + di (c, b). 
The Lemma holds. 


Next let's make m — 4, the alphabet is Z4, On a 4-ary code, we discuss Lee weight 
and Lee distance. Suppose a € Zi, 0 € i < 3, let 


ni(a) -ü[jllzjzn,a-a...a,,aj = ij. (2.17) 


nj (a) is the number of characters equal to i in codeword a. C C Z} is a 4-ary code, 
the symmetric polynomial and Lee weight polynomial of C are defined as 


swec(w,x, y) = > p70) ym (Otn C) y ma C) (2.18) 
ceC 
and 
Leec(x, y) = o m ey, (2.19) 


ceC 
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Lemma 2.12 Let C C Z} is a 4-ary code with codeword length of n, then the sym- 
metric polynomials and Lee weight polynomials have the following relation on C, 


Leec(x, y) — swec(x?, Xy, y^). 
Proof a € Zi, by definition 
nola) + ni(a) + no(a) 4 n3(a) =n. 


Let a = ajaz .. . an, then 


n 


Wi (a) = >) Wi(a)) = ni (a) + 2nz(a) + m (a). 


i=l 


So 
Leec(x, y) = be (cy OTO yml) 
ceC 
= swec(x’, xy, y?). 
The Lemma holds. 


By using Lee weight and Lee distance, we can extend the MacWilliams theorem 
to Z4 codes, we have 


Theorem 2.5 Let C C Zj bea linear code and C+ be its dual code, Leec(x, y) be 
a Lee weighted polynomial of C, then 


1 
Leecı (x, y) = ig eM +y, x— y). 


Proof Let w be a nontrivial characteristic of Z4, and let y be 
v)-(-D,i-U,1,2,3. 


Let f(u) be a function defined on Z}, we let 


g(c) 2 >> fave c, u >). (2.20) 
uc, 


As in Theorem 2.4, there are 


Ys) =ICl 3 fa). (2.21) 


ceC ueCt 


Take 
flu) = p70) ym +n (0) na (0). uc Z. 
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Write u = uju2...u, € Z4, then for each i, 0 < i < 3, we have 
nj(u) = nj (uy) + ni(u2) + +++ + ni (un). 


Thus a 
fw = | | fu. 
i-l 


Let c = c1c2...c, € Z4, by (2.20), 
gc) =] [QS £eovc ci, u >). (2.22) 


i=1 ueZa4 


Now we calculate the inner sum on the right side of equation (2.22). 


w+2x+y, ifc; = 0 
Y fWW(K Gu >) = {w y, ifc = lors: 


ueZa w—2x-cy, Or c; = 2. 
by (2.22), 

g(c) = (w + 2x + y) 9 (w — yy mO (uy — 2x + yy, 
So 


«0 = swec(w + 2x + y, w — y, w — 2x + y). 
ceC 


by (2.21), 
[C|sweci(w, x, y) 2 swec(w 4-2x + y, w — y, w — 2x + y). 


By Lemma 2.12, and replace the variable, there are 


1 
Leeci(x, y) = a +y,x—y). 


We have completed the proof. 
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2.4 Some Typical Codes 


2.4.1 Hadamard Codes 


In order to introduce Hadamard codes, we first define a Hadamard matrix of order 
n. Let H = (aij), if aij = +1, and 


nO0--.0 

On---0 
HH'-nl,-— E , 

00---n 


H is called a Hadamard matrix of order n. It is easy to verify that the following H5 
is a Hadamard matrix of second order 


1 1 
pepe). 
In order to obtain higher-order Hadamard matrices, a useful tool is the so-called 


Kronecker product. Let A = (aij)mxm, B = (bij)nxn, then A and B's Kronecker 
product A & B define as 


a1 B aB --+ ay B 
a2;B a»B --+ a», B 
AGB- : : 
amı B am2 B nets amm B 


Obviously, A & B is a square matrix of order nm x nm. The following results are 
easy to prove. 


Lemma 2.13 Let A be a Hadamard matrix of order m, B be a Hadamard matrix of 
order n, then A & B be a Hadamard matrix of order nm x nm. 


Proof Let A = Gij)mxm, B = (bij)nxa, H = AG B, then 
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ayı B aj? B wY aim B a, B' az B' Sieg Am B’ 
a2B à»B ax B a12B' a»3 B' +++ ay5B' 
HH'= 
amı B dg? B Arum amm B aim B' aj B' Pus amm B’ 
cy BB! cio B B' s Cim BB’ 
Cm BB! Cm BB’ +++ Cmm BB. 
mnl, O0 ->> O 
O0 mh. 0 
0 0 -.-.mnl, 
= mnl, 
The Lemma holds. 


Since H3 is a Hadamard matrix of order 2, then 
H 8 H = HF, Hy ® M @--- @ Hy = HF" 
are Hadamard matrix of order 4 and order 2” respectively. 


Let n be an even number and H, be a Hadamard matrix of order n, take 
Q,,02, ..., @, aS N row vectors, 1.e., 


Qı —01 
a2 a 
H, = , —H, = s 
Qn Ay 
We get 2n row vectors {+a,,+a2,...,+a,}, for each row vectors ta;, we 


replace the component —1 with 0, the row vector a; so permuted is denoted as 
à, —a; denote as —a;, so a; forms a vector of F}, denote as 


C = (ta, Ea, ..., £o4) C F5. 


C is called a Hadamard code. 


Theorem 2.6 The minimum distance of Hadamard code C of length n (n is an even 


number) is d — 5. 
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Proof Let H, be a Hadamard matrix of order n, H, = , Each a; is a row 


vector of H, , substitute o; 5 @;, such thateacha; C F} become a binary codeword. 
We see that this kind of permutation does not change the corresponding Hamming 
distance, that is 

d(a;,a;) = d (Œi, @;) 

d(—aj, —a;) = d(aj, a), 


where i Z j. Let us prove that the minimum distance of C is = leta = ajaz...dn, 
b = bib, ...b, are two different row vectors of Hadamard matrix H,,, because of 


ab’ 202 X abi — 0. 


i=l 


And a; = +1, b; = +1. Let the number of the same character be d; and the number 
of different characters be d = d (a, b), so there are dı — d = O, that is dı = d, but 
dı +d = n, so d = 2 The Lemma holds. 


Corollary 2.6 C = (o, o», ..., Eo) is Hadamard code, then the Hamming 
distance of any two different codewords on C is 5. 


Proof (-Eo, +a2,..., +a,}s the row vector of Hadamard matrix, leta = +a;, b = 
taj(i xz j), then 


ab! = +) aib; = 0 > da, b) = i 


i=l 


A code of length n, number of codewords M, minimum distance d, denoted as 
(n, M, d), different from linear code [n, k] or [n, k, d], Hadamard code is 


C= 6,28 É) 
= (n, n, 5) 


Whenn = 8,d = 4, this is an extension of Hamming code. Whenn = 32, (32, 64, 16) 
is the code used by the U.S. Mars probe in 1969 to transmit pictures taken on Mars. 


2.4.2 Binary Golay Codes 


In the theory and application of channel coding, binary Golay code is the most famous 
one. In order to introduce Golay code G23 completely, we first introduce the concept 
of t — (m, k, X) design. 
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Let S be a set of m elements, that is |$| = m. The elements in S are called points. 
Let R be the set of subsets with k elements in S, |R| = M, i.e., 


R= [Bi Bo,..., By}, Bi C S, |Bi| =k, 1xi x M. 


Element B; in 5A is called block. 


Definition 2.10 (S, 91) is called t — (m, k, à) design, if for any T C S,|T| =t, 
then there are exactly à blocks B in % such that T C B. If (S, R) isa t — (m, k, X) 
design, denote as (S, R) = t — (m, k, A). If à = 1, then t — (m, k, 1) is called a 
Steiner system. 


In a t — (m, k, A) design (S, R), we introduce its occurrence matrix. For any 
a € S, the characteristic function x; (a) is defined as 


1, if a € Bi, 
xila) = . 
0, ifa ¢ Bi, 


write S = (a1,a5,..., Am}, R= (Bi, Bo, ..., By}, |R| = M. Matrix 


x1(a1) X2(a1) -+> Xu (a1) 


Xi(a2) X2(a2) -+> Xu (a2) 
A= (xj (Gi))mxm = s ‘i ‘i 


X1 (am) X2 lam) Ut XM (am) 


A is called the occurrence matrix of t — (m, k, à) design. 

Let’s now consider a concrete example, 2 — (11, 6,3) design. Where there are 
11 points in S and 6 points in 9, and any two points in S have exactly three blocks 
containing it. 


Lemma 2.14 2 — (11, 6, 3) design is the only definite one, that is to say, let S — 
(a1, a, ..., ay}, then there are 11 blocks in R, 


R= (Bi, B5, ..., Bii). 


And for any a € S, exactly 6 blocks Bj in Ñ contain a. 


Proof Suppose V a € S, there is exactly l Bj containing it, because there are exactly 
3 blocks in any 2 points, so there are 6/ —/ = 10 x 3. Then / = 6. In addition, 
suppose |R| = M, because each point has exactly six blocks containing it, there is 
6x M = 11 x 6, we can get M = 11. 


By Lemma 2.14, the generating matrix N of 2 — (11, 6, 3) design is an 11-order 
square matrix 
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Xı(aı) xo(m) ++ Xu (ai) 
xi(a2) xa(a2) -:- xu(a2) 


xin) aan BS T 


And every row of N has exactly six 1’s and five 0’s, and every column of N has 
exactly six 1’s and five 0’s. 


Lemma 2.15 Let N be the occurrence matrix of 2 — (11, 6, 3) design, then 


lec 
NN = 314+ 3Ju, Ju = Raves 
14-1 
If N is regarded as a square matrix of order 11 over F 5, then 


NN =1,4+ Jı. 


Further rank(N) = 10, and the solution of linear equation system XN = O is exactly 
two repeated codewords 0 and 1(0 = (0,0,...,0), 1 = (L 1,..., 1)) in Fl! 


Proof Let NN’ = (bij)uxii, defined by 


11 
bij = X x4 (ai) Xe(aj). 
k=1 
When i Æ j, bij = 3, when i = j, bj; = 6, so we have 
NN' = 31, + 3Ju1 = Di + Ji (mod 2). 
Let N (mod 2) still be N, which is a square matrix of order 11 over F2. we have 


rank(N) = rank(Iljj) — rank(J\,) = 10. 


So the solution space of XN = 0 is a one-dimensional linear subspace of F!. Since 
each column vector of N has exactly six 1’s and five 0’s, then 


(, L,..., DN = (0,0,...,0) € F}. 
So there are exactly two solutions for X N — 0: 
x = (0,0,...,0), x =(,1,..., D). 


The Lemma holds. 
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Next, let's construct a matrix G of order 12 x 24, G = (115, P), where 


01---1 Oy 
1 a2 
P= N , andG = : 
1 012 


where a; € F7 is the 12 row vector of G. Obviously we have a weight function 


w(œ1) = 12, w(a;) = 8, 2 <i < 12. (2.23) 
ay 
o» 

Lemma 2.16 Let G — . |, then (01,05, ..., 012] C Fy is a linear indepen- 
012 


dent group, and the weight of any nonzero linear combination is at least 8, that 
is 
w(aya, + azœ2 + --- + a212) > 8, aj not all zero. (2.24) 


Proof Let's prove that (o) ns , i$ a set of vectors orthogonal to each other, that is, 
the inner product is < o/, à; >= aia) = 0. Obviously we have 


20,0; >= aya, =6=0(mod2), j £1. 


IfizZljzliz;j,hen 


11 


< ai æj >= 1+ 3 ^ xe(ai)xe(aj) = 4 = 0(mod 2). 
k=1 


So <a;,a@; >= 0, when i Æ j, that is (05,05, ..., 05) is a linear independent 
group of FA, If a; € Fo, not all zero, take a = aa? . . . a12, let's prove (2.24) by 
induction of w(a). If w(a) = 1, the proposition holds by (2.23). When w(a) > 8, 
the proposition is ordinary, for 2 < w(a) < 7, we can still prove 


w(ayay + a202 + --- + 42012) > 8. 


So the Lemma holds. 


Definition 2.11 The linear code [24, 12] generated by row vector group (oi, @2,..., 
&œı2}0of Gin Fy is called Golay code, denoted as G24. Remove the last component of 
Qi, Qi — Qi, thena; € F2, The linear code [23, 12] generated by (o, @2,..., @12} 
in F3 is called Golay code, denote as G23. 
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Theorem 2.7 Golay code G23 is a perfect code [23, 12] with minimal distance of 
d — 7. 


Proof Because the minimal distance of linear codes is minimal weight, by Lemma 
2.16, 


way] + a202 + +++ + d1202) > wagi + a502 +--+ + angana) — 1 > 7. 


On the one hand, w(o;) = 8 forVo;,i Æ l,sothereiso; > w(a@;) = w(a;) — 1 = 
7. So the minimum distance of G23 is d = 7. On the other hand, we note that 


3 3 
23 23 
ies: (7) 23: (7) = 
i=0 i=0 


By the sphere-packing condition of Theorem 2.1 = Gy; is a perfect code, the Lemma 
holds. 


2.4.3 3-Ary Golay Code 


In order to introduce 3-ary Golay codes, we first define a Paley matrix of order q. 
Let q > 3 be an odd number, and define a second-order real-valued multiplication 
characteristic x (a) in the finite field IF, as 


0, ifa — 0; 
x(a)= 41, ifa e (F^ 
-1, ifa g Fy. 


Obviously, x is a character in F7. Because F7 is a (q — 1)-order cyclic multiplicative 
group, so we have 


1, ifq = 1(mod 4); 


q-1 
—b2(c-D*s»- 
xXCD- CD —1, ifq = 2 or 3(mod 4). 
Write F} = (ao, a1, ..., 44 1] , where ao = 0, then Paley matrix S, of order q is 
defined as 
0 x C-ai) X(-a2) s xCa4-) 
x (ai) 0 x(ai — a2) +++ x (a1 — a4) 


Sq = (x(ai—aj)axq— | x(a) x(a—a) 0 D XG2 — a4) 


X (aq-1) x(ag-1 — a1) EE ee 0 
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Lemma 2.17 The Paley matrix S, of order q has the following properties: 
(i) $,J, = 4,5, = 9. 
(ii) SgSq = 41q — Ja. 
(iii) S, = (-1)7 S,. 
Here, I, is the unit matrix of order q and J, is the square matrix of order q with all 
elements of 1. 


Proof Let Sj Jq = (bij)qxq, then forVO<i<q—1, 0< j <q —1, there is 


q—1 


xai - a0 = D> x@ 20. 


k= cel, 


So (i) holds. To prove (ii), let S, S, = (cij)gxq, then 


q—1 
cij = J x(ai — a)x(aj — aj). 
k=0 
Obviously, we have 
g= i ifi=j; 
Cij = Sp ; 
—]l, ifizj. 


So (ii) holds. To prove (iii), noticed that x (C1) = (—1) 5 , SO 
1 q-1 " 
Sg = XCDS; = (-1)? S, 
the Lemma holds. 
Let g = 5, we consider the Paley matrix $5 of order 5, it has been calculated that 


0 1 —1—-1 1 
1 0 1-1-1 
Ss=|—1 1 0 1-1 
-1-11 01 
1-1-1 1 0 


In Fl!, we consider a linear code C whose generator matrix is 


eeu j 
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So C is a six-dimensional linear subspace in FH, that is C = [11, 6]. This code is 
called 3-ary Golay code. In order to further discuss 3-ary Golay codes [11, 6], we 
discuss the concept of extended codes of linear codes. 

If C C F7 is a q-ary linear code of length n, the extension code C of C is defined 


as 
n+l 


CHG thas rex ep -3 Cn) € C, and y `c; = 0]. 


i=1 


Obviously, C C pr is a linear code. 


Lemma 2.18 /f C C Fj is a linear code, the generation matrix is G and the test 
matrix is H, then the length of extension code C C prt is n+ l, its generation 


matrix G and test matrix H are 


11-1 

— " 0 
G — [G, B], and H — H .[|. 

0 


respectively. Where B is a column vector and satisfies that the sum of all column 
vectors of B and G is 0. Further, let q — 2, if the minimum distance d of C is odd, 
then the minimum distance of C is d + 1. 


Proof The generation matrix and check matrix of C can be given directly by defi- 
nition. The minimal weight w = w(c) of C can be obtained by c = c1c5 ...c, € C, 
because q = 2, so there are w c; = 1, and w is an odd number, then w Æ 0, let 
Ci 4.1 = 1, then 

c* 2 c12... Cn41 € C and w(c*) 2 d +1. 


This is the minimal weight in C. The lemma is proved. 


Consider the extension codes C — [12, 6] of 3-ary Golay code C = [11, 6], its 
generating matrix is 
11111 0 


G=|5 a, (2.25) 


Note that the sum of the components of each row vector of S5 is 0, and the inner 
product of the different row vectors is - 1, and the inner product of the same row 
vector is 1, so 


Gu =0. 


Therefore, the extended code C is a self-dual code, that is (C)+ = C. 
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Theorem 2.8 3-ary Golay code C is a perfect linear code [11,6], its minimum 
distance is 5, so it is a 2-error correcting code. 


Proof The weight of each row vector of G is 6, according to the calculation, the 
weight of the linear combination of row vectors of G is 6, so the minimum distance 
of extension code C is 6 => the minimum distance of C is 5. So the disjoint radius 
of C is p, = 2. And because 


2 

11. 

oed. [D 

$ 
i=0 


then the condition of sphere packing satisfy 


2 A11 
ely ean, 
i=0 


Thus by Theorem 2.1, C is a perfect code, the Theorem holds. 


Remark 2.1 It is worth noting that J.H.VanLint in 1971 (See reference 2 [24]), 
A.Tietavainen in 1973(See reference 2 [43]) independently proved that perfect 
codes (nontrivial) with minimal distance greater than 3 have only 2-ary Golay codes 
G23 and 3-ary Golay codes over any finite field. 


2.4.4 Reed-Muller Codes 


Reed and Muller proposed a class of 2-ary linear codes based on finite geometry in 
1954. In order to discuss the structure and properties of these codes, we first prove 
some results in number theory. 


Lemma 2.19 Let p be a prime, k,n be two nonnegative integers whose p-ary is 
expressed as 


I 

Lr 
«cM 
*. 


1 
n= your. k 
i=0 


Then 


1 
()) = I] ML p), where( 7) = 0, ifk; > ni. 


i=0 


Proof If k = 0, then k; = 0, so the above formula holds. If n = k, then n; = k;, the 
above formula also holds. We might as well make 1 < k < n, note the polynomial 
congruence 

(1 +x)? = 1 + x? (mod p), 
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so we have 


(1 - x)" = (1 + x)Lino?! 


l 
= [ [a + 2» (mod p. 
i=0 


Comparing the coefficients of the x^ terms on both sides of the above formula, if 
there is ak; > nj, then the x* terms do not appear on the right side of the above 
formula, which means that the coefficients of the x* terms on the left side are 


"| = 0(mod 
(= (mod p). 


Ifk; < n;,V0 <i <1, then 


We complete the proof of Lemma. 


Massey defined the concept of polynomial weight for the first time in 1973, 
on a finite field with characteristic 2 (q = 2”), a polynomial f(x) € F;[x], whose 
Hamming weight is defined as 


w(f(x)) = The number of nonzero coefficients of f(x). 


Lemma 2.20 (Massey, 1973) Let f(x) = yar bi(x +c)! € F,[x] and b; + 0, let 
ig be the smallest subscript i of b; 4 0, then 


wCf (x)) = wr - cy). 


Proof | = 0, then ig = 0, the lemma holds. Let / < 2" be lemma, we consider 2" < 
1 < 2"*!, write f (x) as 


2"—| 


l 
fx) = Mb +0) + Y boo 
i=0 


i=2" 
= fie +(x +e)” f(x) 
= filx) +c" ho) x" f(x), 
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where deg fi (x) < 2", deg fo(x) < 2". There are two situations to discuss: 


(i) If fi (x) = 0, then w(f (x)) = 2w(fo(x)). Because ig > 2”, so 


w(x +o”) = w(x? +e" (x + o?) 
= 2w((x + c)57? ). 


From inductive hypothesis 
w(fa(x)) = w(G t o7). 
So there are 
wCf (x) = 2wCfa()) > 2w(G + c)977) = w(G +. oy). 
Gi) fi(x) Æ 0, ij is the subscript of fi (x), i» is the subscript of fz (x). If the term 


not O in f; (x) plus the corresponding term of c^ f» (x) becomes 0, then x? f» (x) 
will have corresponding terms that are not zero, so we always have 


wCf(x)) = wCfi(x)), wCfx)) = wCf2(x)). 


If i; < i», then io = ij, from inductive hypothesis, 


wCf (x) = WE) = w(G — e)) = w — e). 


Similarly, if iz < ij, then io = i», there is 
w(f (x)) = w(fo(x)) = w((x — o)?) = w((x — o). 


If i; = i» , then it can always be changed into the case of i; 4 io, so we always 
have Lemma holds. 


Next, we use Massey's method to construct Reed-Muller codes. Let m > 1, F7 
be an m-dimensional affine space, denote as AG (m, 2), a € AG (m, 2) is a point in 
affine space, write o as an m-dimensional column vector, let (uo, u1, ..., Um—1} be 
the standard base of I, that is 


7 1 0 
hu 0 0 
1 
a= 5 , U0 = 0 peer Um-1 = 0 , 
s —] 0 1 


where a; = 0 or 1. Let's establish a 1 — 1 correspondence between the points in the 
integer set (0 < j < 2") and AG (m, 2). Let 0 < j < 2”, then 
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m-—1 
. i 
J= ) ajj2 » lij € Fy. 
i=0 
We define 
aoj 
-1 
m aij " 
ae a;jju; = " € I, 
i=0 $ 
ü(m—1)j 


67 


Because when jı Æ j2, there is xj A xj, So {x;|0 < j < 2”} gives all the points in 


F7. Write n = 2" and consider the matrix 


a00 aol ao(n—1) 
a10 aii Q\(n—1) 
E = [xo, x1, "Ue »Xn—1] — 
G(m—1)0 G(m—1)) *** Q(m—1)(n-1) | mxn 
Each row vector o; = (aio, aij, ..., Gi(n—1)) (0 < i < m — 1) of E is a vector of F}, 
which is written as 
ao 
a) 
E = P = (aij))nxa(0 € i « m, 0 € j « 2" =n). 
Om-—1 


For each i, 0 € i < m, define a linear subspace in IF5', 


Bj = [xj € F5 laij = O}. 


Obviously, B; is a linear subspace, and the additive coset of B; is called an m — 1- 


dimensional plat in F7. We consider A; = B; + ui, 


A; = (x; € Fela; 2 1,0x j <n} > |Ai| = 2771. 


We define the characteristic function x; (o) in F} according to Aj, 


1, ifo cA; 


x(—]6 ita A. 


where o € F7. So each row vector o;(0 < i < m) in E can be expressed as 


oj = (Xi (xo), Xi x1), ---, Xi -1)). 
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For any two vectors a = (bo, b, ..., b. 1). B = (Co, C1, ---, Cn—1) in F5, define the 
product vector 
ap = (boco, bıcı, ..., by 1043) € F3. 
So for 0 < ij, i2 < m, we have the product of row vectors of E 
Qj Ai, = (Xi (X0) Xi (xo), Xi (x1) Xin (x1), f 0$ Xii (Xn-1) Xi; (Xn -1)). 


So the j-th (0 < j < 2") component of oj, œi, is 


( ) ( ) 1, if x; € A, 1 Ai; 
(x3) x; (x) = 
TESTES STEN cate E A P 
From the definition of A;, obviously, 
Ai, N Apl] = 2772. 
Lemma 2.21 Leti, iz, ..., i; be the number of s(0 < s < m) different indexes from 
0 tom — 1, then 
|Aj, 1 Ay A NA Aj| 2 277, 
And di Qi, - -- oj, € F} has a weight function 
w(i oj, i) = 2". 
Proof The first conclusion is obvious. Let’s just prove the second conclusion, 


9 = Qi Qj, ++ Ai, = (Xj, (XO) xi KO)» Xi Q0) + Xi, 1), ss Xi GO + + Xi, Gr -1)) 


has 2"? x; € Aj, N Aj N -+ N Az, so there are 2" * components in o that are 1 
and the others are 0, so 


w(a) = W (Qj, Oi, ex T7 amos. 
the Lemma holds. 
For 0 < l < 2”, J (I) is defined as an indicator set, 
m-—1 


I(D = (ii, ....i [l= a satisfy a; = 0). 
i=0 
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The following properties of the indicator set Z (l) are obvious: 


G) Fh Ab — 1h) Al). 
(ii) Chee Td) = (0,1,2,...,m— 1}. 
Gi) If/=n—1=> I(n — 1) is an empty set. 


The above properties are easy to verify, such as (iii), because] = n — 1 22" — 1 = 
14-2 4 --- 4- 27-1, so the subscripts i of aj; = 0 don't exist, that is Z (n — 1) = Ø. 
Sometimes we can write indicator sets 7 (I) = {i}, i2,..., is}. 


Lemma 2.22 Let 0 € | < n = 2", I(l) = (i1, i2,..., is}, re hypothesis 
Qi Qj, +e Qi, = (bio, bu, ... bi) € F5, 


then in the ring F;[x], there is 
n—i 
(+x) = > byes, (2.26) 
j=0 


Proof For 0 < j < n, write j = Y 79 a;;2', then 


m-—1 
n—1- j 2 3 cij2, where cij = 1 — aij. 
i=0 
By Lemma 2.19, 
l T dq 
js I] (mod 2). 
n-1-j i20 NC 


If 


( j ) = (mod2), 
n-1-j 


then when aj; = 0, > cj; = 0 > aj; = 1, that is to say 


l 
( ) = I(mod2) & ajj = 1, forVi e I(I). 
n-1-j 


on the other hand, from Lemma 2.21, 


by =1 $ xj € Aj, ( )4An()---( Ai, & aij = 1, when i e 10). 
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Compare the x”~'~/ terms on both sides of formula (2.26), so we have 
n—1 
(a+ x)! = b» 
j=0 


The Lemma holds. 


For any 0 </ <n = 2", we define the index set J (I) = (i1, i2,..., is} and the 
vector in F3. 
Nj = Qj O0. 


s 


The index set /(/) corresponding to different / is different, so the corresponding 
vector N; is different; since the index set corresponding to / = n — | is an empty set, 
the corresponding vector N,, , is defined as 


N,4,—-(L 1,...,1)—e. 
Let eo = (1,0,...,0),..., 6,11 = (0,0,..., 1) be a set of standard bases of F}. 
Lemma 2.23 For0 < j « n, we have 


m-—1 


e;  [[(; +0 apo. 


i=0 
where a; is the i-th row of matrix E. 


Proof For vector a in F}, its complement vector & is defined to replace the component 
of 1 in œ with 0, and the component of 0 in œ with 1. So there are 


a +0 =e = (1,1,...,1),Y& € F3. 


When 0 < j < nis given, we define the j-th complement of row vector œ; (0 < i < 


m) of matrix E as 
= aj, ifai; = l; 
oi) = | ME. 


Qi, if aij = 0. 
Obviously, there is 
œi + (l + ajj)e = a; (j), 
from the definition of index set J (/), we have 


a, ifi g IG); 


oi) = I" ifi € I(j). 
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Now let’s calculate 


m-1 


m—1 
[[@ +d +aije = [Taw 
i=0 i=0 
= I] (e — qi) I] a; = b. 
ieI(j) iI) 
where b € F}, let b = (bo, bı, ..., bn—1). Obviously, b; = 1. If k Æ j, then 
b= [| a [[ 0- aw =0. 
igI (j) iel(j) 
Thus b = e;. We have completed the proof of Lemma. 


Lemma 2.24 (Nijo-;-, constitutes a group of bases of F}, where N,- =e = 
(1, 1, ..., D). 


Proof {Nj}o<i<n has exactly n different vectors, let's prove that they are linearly 
independent. Let 


Nj = ajo, -- i, = (big, br, ++ -3 biu), 


n—l n—-l n—-1 n=l 
» c Ni; = È cibio, > cibi, ..., > Cibin-1)) 
1=0 1=0 1=0 1=0 

be a linear combination. Where c = (co, c1, ..., Cn_1) Æ 0. Because 


n—1 


fG) 2 Ya x! EFi], fœ) z 0. 


1=0 
By Lemma 2.22, we have 


n—l n-1 


fx)- 7393 ciby)x" 17, 


j-0 1-0 


So if there's a component by c)bj; # 0, that is {N;}o<1<n is a group of bases. The 
Lemma holds. 


Definition 2.12 Let0 < r < m, a linear code of order rHReed-Muller code R(r, m) 
be 
R(r, m) = L((aijo ...o;|0 € s € r]) c F3, 


the vector corresponding to s = 0 is e. 
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Obviously, when r = 0, R(0, m) corresponds to the repeated code in F35: 
R(0, m) = {(0,0,...,0), (d, 1,..., D]. 
For general r, 0 < r < m, R(r, m) is a t-dimensional linear subspace in F}, where 
"fm 
t= ; 
XC) 


Lemma 2.25 The dual code of Reed-Muller code R(r, m) of order r is R(m — r — 
1, m). 


Proof The dimensions of R(r, m) and R(m — r — 1, m) are 


dim(R(r,m)) = Y? () 


s=0 
and 
m—r—-1 A 
dim(R(m —r —1,m)) = » ( ) 
M 
s=0 
Because 
r P m-—r-—i " 
ex 
s=0 s s=0 mS 
r ai m an 
XO EC) 
s=0 d s=r+1 i 
m m 
EC) 
s=0 s 
= 2" =n. 
That is 


dim(R(r,m)) + dim(R(m —r —1,m)) =n. 
Let œi Qi, - + Ai 0j 0j, +++ o j, be the basis vectors of R(r,m) and R(m-r-1,m), respec- 
tively. Let 
A = A, Hi, Hi, B= Mj jr, 
by Lemma 2.21, 


w(o) = 2", w(B) = 2%, s<r <m, t<m-r-!l, 


because s + t < m, the product «f = di, Qi, `- Qi, - @j,0j,--- aj, has weight 
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wap) = Wi, Aj, A Hj, eso) = 27 670, 


s 


so 
< a, p >=0, 


That 1s, the dual code of R(r, m) is R(m — r — 1, m). The Lemma holds. 


Theorem 2.9 Reed-Muller code R(r, m) of order r have minimal distance d — 
2"^7. specially, when r = m — 2, R(m — 2, m) isa linear code [n,n — m — 1]. 


Proof From Lemma 2.21, we have 
w (aij, ar, oj) = 277, 


so the minimum distance of R(r,m) is d < 2”~", on the other hand, let Jı (r) be the 
value of all / of corresponding {i,, i2, ..., iş} under the condition of s < r, let 


Qiii ++, = (big, bn, ... biu), 


then 


n-l 
fx) = »3 qd +x) = X x cibij)x" t. 


lel, (r) j=0 len) 


Therefore, the weight function of linear combination has the following relationship: 


w( » C Qt, e a) = WC f (x)). 


lel (r) 


Define io as 
ig = min(/|/ € I (r)). 


Obviously, 
ij2d424-2^71!29"*.1 


from Lemma 2.20, then there is 
wCf (x)) = w((x + D?) = ig + 12 2". 


Because the combination numbers 


(7) -( ~')o<k<2-9 
k k dia 


are all odd, this is because 


ig2 1424-2", Kako thy 29b su 12", 
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V ki < 1, so as to deduce 


So there is 


(?) = I(mod 2). 


In the end, we have d = 2"7".Ifletr = m — 2, then the minimum distance is 4. The 
dimension of R(m — 2, m) is 


So R(m — 2, m) is a linear code [n, n — m — 1]. The theorem is proved. 


Because R(m — 2, m) is in the form of linear code [n, n — k], and the minimum 
distance is 4, so we consider R(m — 2, m) as a class of extended Hamming codes. 
Although it is not perfect, Hamming codes are perfect linear codes. 


2.5 Shannon Theorem 


In the channel transmission, due to the interference of the channel, a codeword x € C 
cannot be decoded correctly after it is sent, the probability of this error is recorded 
as p(x), which is called the error probability of codeword x. According to Hamming 


distance, after code C is selected, according to the decoding principle of “look most 


g 


sendi 
like", the error probability p(x) of a codeword x TS" x' satisfies 


1 
p(x) = 0, if d(x, x') < pı < 5M 
p(x) >0, ifd(x, x) > pi. 


where is the disjoint radius of code C. Therefore, the error probability p(x) of 
code word x is related to code C. The error probability of code C is 


1 
BOE Yo pt). 


xeC 
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Itis difficult to calculate the error probability of a codeword mathematically, we take 
the binary channel as an example, C C F} isa binary code of length n, to calculate the 
error probability p(x) of x € C, we agree that the transmission error probability of 
character 0 is p, p < i that is the probability of receiving 0 as 1 after transmission, 
and the probability of character 1 transmission erroris also p, although the probability 
of error is very low, that is, the value of p is very small, the probability of error exists 
due to the interference of channel. We further agree that the error probability of 
each transmission of character O or 1 is p, which is called memoryless channel. In 
the memoryless binary channel, the transmission of a codeword x = x1x?...x, € C 
just constitutes the n-fold Bernoulli test, this probability model provides a theoretical 
basis for calculating the error probability of codeword x, let's take 2-tuple code as 
an example. 


Lemma 2.26 Let A, bea binary repeated code of length n, that is A, = (0, 1) C I5, 
p(An) is the probability of error, then 


lim p(A,) = 0. 

n—oo 
Proof The transmission of codeword 0 = (0, 0, ..., 0) is regarded as n-fold Bernoulli 
test, the character O has only two results of 0 and 1 after each transmission, the prob- 


ability of occurrence of 0 is q = 1 — p, and the probability of occurrence of 1 is 
p« i Let 0 < k < n, then the probability of 0 appearing k times is 


N \ k n-k 
(Der. 


Ifk jn, then there are k > jn O characters in the received codeword after the 
codeword 0 is transmitted, suppose 0 > 0, then d(0,0) < n— k < jn. Because 
the disjoint radius of repeat code is in, according to the decoding principle, we can 


always decode 0 — 0 correctly; therefore, the error of codeword 0 = (0,0,...,0) € 
IF5 occurs if and only if when k < in, the error probability is 


n 
p - (Der. 
O<k<# 


Similarly, the error probability of codeword 1 = (1, 1,..., 1) e F5 is 


n 
Dy k nk 
pa) ] (De 
O<k<4 


Therefore, the error probability of repeat code A, is 


76 2 The Basis of Code Theory 
pA)= X (ir 
” k 
O<k<5 


To calculate the limit value n — oo of the above equation, let's see 


Š, (; B 2. (1) = 2", 


O<k<n 
Because p < $, so p < q, and when k < 2 we have 


klog 4 < Z dog £. 
p 2 p 


It can be directly proved by the above formula 


g'p*-* < (dp)*. 
Thus 
P(An) € 2" (qp)? = (4qp)?. 


1 
Because when p « 5, 


2 MM ien 
P P 4 2 , 


so 
1 
p(l — p) = pq < 2 that is 4pq < 1. 
Therefore, 
0 « lim p(A,) < lim (4qp)? = 0. 
n— oo n— oo 
The Lemma holds. 


Below, we assume that the channel transmission is binary memoryless symmetric 
channel. Each code is binary code. The error probability of each transmission of 
characters 0 and 1 is p, q = 1 — p, p < 2. For given codeword length n and the 
number of codewords M = M,, we define Shannon's probability P*(n, Mn, p) as 


P*(n, Mn, p) = min(P(C)|C C F5, |C| = My}. 


Shannon proved the following famous theorem in 1948. 


Theorem 2.10 (Shannon) /n a memoryless symmetric binary channel, let 0 < X < 
1+ plog p + qlogq be a given real number, M, = 2'*"!, then we have 


lim P*(n, M,, p) = 0. 
n— oo 
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In order to understand the meaning of Shannon's theorem and prove it, we need some 
auxiliary conclusions. 


Lemma 2.27 0 < à < 1 + plog p +qlogq is a given real number, any binary 
code C C F", if |C| = 21, then the code rate Rc of C satisfies 


1 
tae Ree: 
n 


Specially, When n — oo, the rate of C approaches A. 


Proof 
IC] = 2") > log, |C| = [An] < An. 


Therefore, 


1 
Rc = — log, |C| SA. 
n 


From the properties of square bracket function, 


An < [An] + 1, 
SO 
An — 1 < [An] = log, |C|. 
There are 1 1 
À— — < —log,|C| = Rc. 
n n 
The Lemma 2.27 holds. 


Combining Lemma 2.27, the significance of Shannon's theorem is that the code 
rate tends to the capacity 1 — H (p) of a channel when the code length n increases 
and tends to infinity, and there exists a code C whose error probability is arbitrarily 
small, according to Shannon's understanding, this kind of code is called “good code". 
Shannon first proved the existence of “good codes" under more general conditions 
by probability method. Theorem 2.10 is only a special case of Shannon's channel 
coding theorem. To prove Shannon theorem, we must accurately estimate the error 
probability of a given number of codewords under the principle of decoding. 


Lemma 2.28 In the memoryless binary channel, let the probability of each transmis- 
sion error of characters 0 and 1 be p, q = 1 — p, a codeword x = xix2... x, € F5 


has exactly w characters error during transmission, then for any € > 0, letb = „| *P2, 


we have 
Pío » np bj x e. 


Proof For any a codeword x = x1x2 . . . x, € F5, when transmitted in a memoryless 
binary channel, it can be regarded as an n-fold Bernoulli test, @ with exactly w errors 
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in x can be regarded as a discrete random variable with a value of 0, 1, 2, ..., n, the 
probability of occurrence of o is (i.e., the probability of the value w of the random 
variable c) 


b(o,n, p) - C )re- o 


Therefore, the probability distribution of w obeys the discrete random variable of 
binomial distribution. From Lemma 1.18 of the first chapter, the expected value 
E(w) and variance D (w) of o are as follows: 


E(w) =np, D(@) = npq. 


From the Chebyshev inequality of corollary 1.2, for any k > 0, 


P{|w — E(@)| > ky D(o)) < d 


Take k = Jz then we have 
P{w >np+b} x P{|w—np| > b) x e. 


That is 
P{w>np+b} ze. 


The Lemma 2.28 holds. 


Lemma 2.29 Take p = [np + b], where b = Jj paR then 
- log am plogp + oc. 
n ^n „s/n 


E Pata Oe 5 


Proof When e > 0 is given, b = O(./n), so p can be rewritten as 


1 
p =np + O(/n), t =P +O. 
Thus 
p 
"log. = (p+ i di OCT 


1 
=(pt OCT dog p + log(1 + PM 


For the real number x of |x| « 1, we have the following Taylor expansion 


2.5 Shannon Theorem 79 


log(1 + x) : 2,1 
(0) = — 
g x Xx 2* lg x 


So when |x| < 1, we have 
log(1 + x) = O(Ixl). 


thus 1 1 
log(1 + Ta = d 
we have 
Edo Ss ocl2ydo + oi) 
da da RE a TERVE Ja 
1 


Similarly, for the second asymptotic formula, 


ü= ea- rarer OCS 


the Lemma 2.29 holds. 


To prove Shannon theorem, we define the following auxiliary functions, and for 
any two codewords x, y € F5, p > 0, define 


0, ifd(x, y) > p; 


JEW) i depan: 


Let C = {x1, x2, ..., xm} C F5 bea binary code of |C| = M, define 


ga =1- F504) fad. 
jz£i 


Lemma 2.30 Assuming y € F is a given codeword, then 


gi(y) =0, ifx; € C is the only codeword so that d(y, xi) € p, 
gi(y) x 1, otherwise. 


Proof If there is a unique x; € C such that d(y, x;) < p, then f,(y, xj) = 1, but 
foy, xj) = 0 z j), therefore 


gi) =1— foy, xi) + 2 609) =0. 
jzi 
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If d(y, xi) > p, then f,(y, xj) = 0, so 


g0) =1- fo. x) + >> foo, x) 1» fo x) z 1. 
jzi jzi 


If d(y, xj) < p, but there is at least one x, Æ x; such that d(y, xk) < p, then 


gi) 21— fo. x1) + Y fos x) 
JAI 
=1+ Y foax)xL 
JAI JFK 


The Lemma 2.30 holds. 


With the above preparation, we give the proof of Shannon’s theorem. 


Proof (The proof of Theorem 2.10) According to the assumptions of the theorem, we 
assume that O < à < 1+ plog p + qlogq is a given positive real number (p < 5). 


M = M, =2""), |C|\=M. 


Let 
IC] = {x1, x»... xm} CFS, 


& > 0 is any given positive number, 


n 
b= T. p=[pn bl. 


Because of p < P when n is sufficiently large, we have p = pn + O(A/n) < in. 


ET tra it 
In order to calculate the error probability of codeword x; € C, suppose x; — y, 


if d(x;, y) < p, and there is a unique codeword x; € C such that d(y, x;) < p, so 
according to the decoding principle of “look the most like", x; is the most similar 
codeword in C, so we can decode it correctly as y —>transmit x- in this case, the 
error probability of x; is 0. Otherwise, there will be real decoding error. On the other 
hand, y becomes x;, and the occurrence probability of the received codeword after 
transmission is the conditional probability p = (y|x;), so the error probability of x; 
is estimated as 


P; = pœ) € Y polg) 
yel 
M (2.27) 
= Y pola) - fo. x) + Y Y pole fH. x). 


yel yel, j=l 
jzi 
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According to the definition of f,(y, xi), the first term of the above formula is the 
probability that the received codeword y sent by x; is not in ball B, (xj), i.e. 


5 pCOGlxi)( — fo, xi)) = P{received codewords y|y ¢ B,(x;)}. 


ep" 
yel 


Because w = d(y, xj) is exactly the number of o error characters in x; — y, from 
the Chebyshev inequality of Lemma 2.28, we have 


P (received codewords|y ¢ B,(x;)} = P{w > p) x Plo > np bj) «e, 


from (2.27), we have 


M 

P; = pD) E & 3 YO pOl fo. xj. (2.28) 
yeF; j=l 
| j#i 


Because the definition of the error probability p(C) of code C, so there is 


1 M M M 
p(C) = 4:2 pi) Se M $7 pol) >) fo. xp 
i=1 j=l 


i=1 yel 


i#i 


Since C is randomly selected, we can regard p(C) as a random variable, so 
Shannon's probability P * (n, Mn, p) is the minimum value of p(C), so it is less 
than the expected value of p(C), i.e. 


P*(n, Mn, p) < E(P(C)) 


M M 

Set M» >) 9 EON) fo, xj) 
i=1 yeFS j=l 
j£i 


When i is given, the random variables p(y|x;) and f (y, x;)(j 4 i) are statistically 
independent, so 


E(p(y|xi) - fo, x) = EO) E (fo, xj). 
So there is 
M M 
P*(n, Mn, p) <e +MY YO M EPOIXDE O, xj). (2.29) 


i=1 yeF} j=l 
J#i 
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Let's calculate the expected value of f (y, xj), because y is selected in F} with equal 
probability, so 


E(foG, xj) = 3 POPO, xj) 


yer 
1 

= zn Bo (i) 
1 

= y, | Bo OI. 


So there is 


M M 
P*(n, Mn, p =€ + M 3M EOX) 9 | EGG x5) 


i-l yel; j=l 
jr (2.30) 
M 
- (M — 1)|B,(0)| 
=e+M" YOY EO) 53 —. 
i=1 yeF; 


Now let's calculate the expected value of p(y|x;)(y fixed, x; randomly selected in 
C) 


M 
EOD) = D> po pGlx) = pO). 


i=1 


thus 

M M 

3: EGO» = lp = M. 

i=1 yeF; i=1 yel 
From (2.30), 

: M-1 
P"(n, Mn, p) < £ + —,—|B,(0). 
log; (P*(n, Mn, p) — £) < log; M + log, |B,(0)| — n. 

That is 


1 1 
= 9C" (n, Mn, p) = £) < „08: M + z 1082 |B,(0)| =i 


From Lemma 1.11 of Chap. 1, 


1 1 E fn p 
— log; By(0) = — log; Y (^) < H(-), 
n n io l n 
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where H(x) = —x logx — (1 — x) log(l — x)(0 < x < i) is the binary entropy 
function, so there is 


1 1 
= log, (P*(n, My, p) — £) € - log; M + H(2) — 1. 
n n n 


By hypothesis M = 2"), p = [pn + b], b = O( An), we have 


I " [An] p 
— log, (P* (n, Mn, p) — £) € +H(-)-1 
n n n 
p 1 
=A+H(-)-14+ O(—). 
n n 


By Lemma 2.29, 


AO) =-( log 4 (19 Dog e D 
n n n n n 

1 
zu Md uw 


So 


1 1 
—log;(P*(n, Mn, p) e) < X — (1 + plog p + qlogq) + O( —). 
n Jn 


By hypothesis A < 1 + p log p + 4logq, when n is sufficiently large, we have 


1 
A log; (P*(n, Mn, p) — €) < —B(B > 0). 


Therefore, 0 < P*(n, Mn, p) < € + 2-P" take the limit n — oo on both sides, 
finally, 

lim P*(n, M,, p) = 0. 

n—oo 
We completed the proof of the theorem. 


According to Shannon, the code rate is close to a given normal number À, 
0<A<1+plogp+gqlogg —1— H(p), 


the code with arbitrarily small error probability is called *good code", we further 
analyze the construction of this kind of “good code". (Shannon only proved the 
existence of “good code" in probability). 


Theorem 2.11 For given à, 0 < à < 1 + plog p + qlogq(p < 3), M, = 2°”, if 
there is a perfect code Cy, and |C,| = My, then we have 


lim p(C,) = 0. 
noo 
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Proof If perfect code C, exists, by Lemma 2.27, 


1 
A——< Re, €. 


= 


Therefore, the code rate of C, can be arbitrarily close to A, the error probability of Cn 
can be arbitrarily small, so C, is a “good code” in the mathematical sense. To prove 
Theorem 2.11, because C, is a perfect code, the minimum distance d, is defined as 


d, = 2e, +1 : 
n = Len >On € on. 
"ESOS 


Because of lim Re, = A, by Theorem 2.2, we have 
n—oo 


€ 


lim H(—)=1-A> H(p). 
n 


n—oo 
Because the binary entropy function H (x) is a monotone continuous rising function 
(0 «x « D. So we have the limit lim ®, and 
noo ” 


. € . € : , 

lim — > p, that is — > p, When n is sufficiently large. 

n>œ n n 
Now consider the error probability p(x) of codeword x = x1x5... x, € Cy, since Cn 
is e, error correction code, so x — x’, when d(x, x') < e,, we can always decode 
correctly, at this time, the error probability of x is 0. Therefore, x transmission 
error, that is, the case where x’ cannot be decoded correctly occurs only in case 
d(x’, x) = Wn > en. At this point we have(When n is sufficiently large) 

Wn en 


— > — > pte, (exista € > 0) 
n n 


So the error probability p(x) of x € C, is estimated 


p(x) < Pi > pae) 


w 
< P{|— -= p| > 8}. 
n 


Because when n — oo, the random variable sequence {w,} is a Bernoulli random 
process (i.e., for each n, it is n-folds Bernoulli test). From theorem 1.2 in Chap. 1, 
we have T 

lim p(x) € lim P{|— — p| > €} — 0. 

n— oo noo n 


For V x € C, holds, so 
lim p(C,,) = 0. 
n—oo 
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The Theorem 2.11 holds. 


From the proof of Theorems 2.10 and 2.11, it can be seen that Shannon randomly 
selects a code and randomly selects a codeword, which essentially regards the input 
information as a random event in a given probability space, and the transmission 
process of information is essentially a random process. The fundamental difference 
between Shannon and other mathematicians at the same time is that he regards 
information or a code as a random variable. The mathematical model of information 
transmission is a dynamic probability model rather than a static algebraic model. The 
most important method to study a code naturally is probability statistics rather than 
the algebraic combination method of traditional mathematics. From the perspective 
of probability theory, Theorems 2.10 and 2.11 regard a code as a random variable, 
but they have great particularity. The probability distribution of this random variable 
obeys Bernoulli binomial distribution, especially the statistical characteristics of code 
rate, which are not clearly expressed. It is the core content of Shannon’s information 
theory to study the relationship between random variables with general probability 
distribution and codes. One of the most basic concepts is information entropy, or 
code entropy. Using the concept of code entropy, the statistical characteristics of a 
code are clearly displayed. Therefore, we see a basic framework and prototype of 
modern information theory. In the next chapter, we explain and prove these basic 
ideas and results of Shannon information theory in detail. One of the most important 
results is Shannon channel coding theorem (see Theorem 3.12 in Chap. 3). Shannon 
uses the probability method to prove that the so-called good code with a code rate 
up to the transmission capacity and an arbitrarily small error probability exists for 
the general memoryless channel (whether symmetrical or not). On the contrary, the 
code rate of a code with an arbitrarily small error probability must not be greater 
than the capacity of the channel. This channel capacity is called Shannon’s limit, 
which has been pursued for a long time in the field of electronic communication 
engineering technology. People want to find a channel coding scheme with error 
probability in a controllable range (e.g., less than £) and transmission efficiency (i.e., 
code rate) reaching Shannon's limit. In today's 5G era, this engineering technical 
problem seems to have been overcome. Returning to theorem 2.10, we see that the 
upper limit 1 — H(p) of the code rate is the channel capacity of the memoryless 
symmetric binary channel (see example 2 in Sect. 8 of Chap. 3). From this example, 
we can get a glimpse of Shannon's channel coding theory. 


Exercise 2 


1. Please design a code of length 7, which contains 8 codewords, where the Ham- 
ming distance of any two codewords is > 4. The code is transmitted through 
symmetric binary channel, assuming the error probability of characters 0 and 1 
is p, calculate the success probability of codeword transmission. 

2. Let C be a binary code of length 16, satisfy 


(i) Each codeword has a weight of 6. 
(ii) Any two codewords have Hamming distance of 8. 
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10. 


11. 
12. 


13. 


14. 


15. 


16. 


2 The Basis of Code Theory 


Prove: |C| < 16. Does the binary code C of |C| = 16 exist? 


. Let € bea binary code of length n and an error correcting code of one character, 


prove 
n 


C|« 
| raw, 


(nis even). 


. Let € bea binary perfect code of length n, and the minimum distance is 7. Prove: 


n=7orn= 23. 


.LetC C E; be a linear code, C = [n, k] and any k coordinates be symmetric, 


prove: the minimum distance of C is d =n — k + 1. 


. Suppose C = [2k + 1, k] C oer and C C C+, write the difference set CING, 
. Let x = x1x2...Xg6 € F$, Decide Hamming ball | Bı (x)|. We can find a code 


CC F$? Where |C| = 9, satisfy the Hamming distance of any two different 
codewords in C is > 3? 


. Let C = [n, k] C F7 be a linear code, the generating matrix is G, if every column 


of G is not all zero, prove 


X wa) 2 nq - Dq. 


xeC 


Where w(x) is the weight of codeword x. 


. LetC = [n, k] bea linear binary code, and there is a codeword with odd weight in 


C, prove that the codewords with even weight in C form a linear code [n, k — 1]. 
Let C be a linear binary code, the generating matrix G is 


1000101 
0100101 
0010011]' 
0001011 


Please decode the received codewords as follows: y; — (1101011), 
y2 = (0110111), y; = (0111000). 

Let p be a prime, is there a self-dual linear code C = [8, 4] over IF? 

Let R; be the rate of binary Hamming codes, find jim Ry =? 


— 00 
Let C bea linear binary code, the weight distribution polynomial is A(z), finding 
the weight distribution polynomial B(z) of dual code C+. 
Let C — [n, k] C F5, weight distribution polynomial be A(z), we use binary 
symmetric channel to transmit codewords, and the error probability is p (the 
error probability of characters 0 and 1), we hope that a codeword transmission 
error can be detected, and calculate the probability that a codeword transmission 
error will not be detected. 
There is no linear code C — [15, 8] with minimum distance 5 over any finite 
field Fy. 
Let n = 2”, proved that Reed-Muller code R(1, m) is Hadamard code of length 
n. 
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17. Proved that ternary Golay has 132 codewords and its weight is 5. Let x be 
the codeword of weight 5, consider all pairs (x, 2x), where w(x) = 5, take the 
component whose coordinate component is not zero as a subset. Proved that 
there are 66 such subsets and form 4 — (11, 5, 1) designs. 

18. If the minimum distance d of a binary code C = (n, M, d) is even, prove that 
there exists a binary code such that all its codewords have even weights. 

19. Let H be a Hadamard matrix Hj», define 


A= H — I, G = (1, A), I is the unit matrix. 


Proved that G is the generating matrix of ternary code [24, 12] and the minimum 
distance is 9. 

20. Let C = [4, 2] be a ternary Hamming code. H is the check matrix of C, let Z be 
the unit matrix of order 4, J is a square matrix of order 4 with all elements of 1, 


define 
_{J+ii I 
c-| 0 RM 


prove that G generates a ternary code C — [12, 6] and the minimum distance 
is 6. 
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Chapter 3 A) 
Shannon Theory get 


3.1 Information Space 


According to Shannon, a message x is a random event. Let p(x) be the probability 
of occurrence of event x. If p(x) = 0, this event does not occur; If p(x) = 1, this 
event must occur. When p(x) = 0 or p(x) = 1, information x can be called trivial 
information or spam information. Therefore, the real mathematical significance of 
information x lies in its uncertainty, that is 0 < p(x) < 1. Quantitative research on 
the uncertainty of nontrivial information constitutes all the starting point of Shannon’s 
theory; this starting point is now called information quantity or information entropy, 
or entropy for short. Shannon and his colleagues at Bell laboratory considered “bit” 
as the basic quantitative unit of information. What is “bit”? We can simply understand 
it as the number of bits in the binary system. However, according to Shannon, the 
binary system with n digits can express up to 2" numbers. From the point of view 
of probability and statistics, the probability of occurrence of these 2" numbers is x 
Therefore, a bit is the amount of information contained in event x with probability L, 
Taking this as the starting point, Shannon defined the self-information / (x) contained 
in an information x as 


I(x) = — log; p(x). (3.1) 


Therefore, one piece of information x contains J (x )-bit information, when p(x) = i 
then / (x) = 1. Equation (3.1) is Shannon’s first extraordinary progress in information 
quantification. On the other hand, with the emergence of Telegraph and telephone, 
binary is widely used in the conversion and transmission of information. Therefore, 
we can assert that without binary, there would be no Shannon's theory, let alone the 
current informatics and information age. The purpose of this section is to strictly 
mathematically deduce and simplify the most basic and important conclusions in 
Shannon's theory. First, we start with the rationality of the definition of formula (3.1). 

If I (x) is used to represent the self-information of a random event x, the greater the 
probability of occurrence p(x), the smaller the uncertainty. Therefore, Z (x) should 
be a monotonic decreasing function of probability p(x). If xy is a joint event and 


© The Author(s) 2022 91 
Z. Zheng, Modern Cryptography Volume 1, Financial Mathematics and Fintech, 
https://doi.org/10.1007/978-981-19-0920-7 3 


92 3 Shannon Theory 


is statistically independent, that is, p(xy) = p(x)p(y), then the self-information 
amount is Z (xy) = I (x) + I (y). Of course, the self-information amount / (x) is 
nonnegative, that is / (x) > 0. Shannon prove, the self-information 7 (x) satisfying 
the above three assumptions must be 


I(x) = —c log p(x), 
where c is a constant. This conclusion can be derived directly from the following 
mathematical theorems. 


Lemma 3.1 /f the real function f (x) satisfies the following conditions in interval 
[1, +00): 


ü f(x) z 0, 
(ti) Ifx « y — f(x) « fo). 
(ii) f(xy) = fa) + fO). 


Then f (x) = clog x, where c is a constant. 


Proof Repeated use condition (iii), then there is 


fa =kf@œ), ESI 


for any positive integer k. Take x = 1, then the above formula holds if and only if 
f (1) = 0. It can be seen from (ii) that f(x) > 0 when x > 1. Letx > 1, y > 1 and 
k > 1 given, you can always find a nonnegative integer n to satisfy 


k n+l 
<y 8 


y” sx 
Take logarithms on both sides to get 


n logx n+l 
< < 
k ~ logy k 


On the other hand, we have 


nf (y) € kf(x) « (n - Df). 


thus 
ric DEA) 2 1 
f(y) logy k 


when k — oco, we have 


f(x)  logx 
fO) logy’ 


V x, y € (1, +00). 


Therefore, 
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IH IO) _ V x, y € (1, +00). 
logx logy 


That is f(x) = clog x. The Lemma holds. 


In Lemma 3.1, let Z (x) = fos then f (x) satisfies the condition (i), (ii) and 
(iii), thus Z (x) = —c log p(x). That is (3.1) holds. 

In order to introduce the definition of information space, we use X to represent a 
finite set of original information, or a countable and additive information set, which 
is called source state set. It can be an alphabet, a finite number of symbols or a 
set of numbers. For example, 26 letters in English and 2-element finite field IF? are 
commonly used source state sets. Elements in X can be called messages, events, 
etc., or characters. We often use English capital letters such as X, Y, Z to represent a 
source state set, and lowercase Greek letters £, n, ... to represent a random variable 
in a given probability space. 


Definition 3.1 The value space of a random variable £ is a source state set X; the 
probability distribution of characters on X as events is defined as 


p(x) = Pl$ =x}, Vx e X. (3.2) 


We call (X, £) an information space in a given probability space, when the random 
variable £ is clear, we usually record the information space (X, £) as X. If is another 
random variable valued on X, and & and 7 obey the same probability distribution, 
that is 

P{E =x} = Pin=x}, Vx eX. 


Call two information spaces (X, £) = (X, 7), usually recorded as X. 


As can be seen from Definition 3.1, an information space X constitutes a finite 
complete event group, that is, we have 


X po) —1,0zp(x)zl xex. (3.3) 


xeX 


It should be noted that if there are two random variables & and 7 with values on X, 
when the probability distributions obeyed by & and 7 are not equal, then (X, 4) and 
(X, n) are two different information spaces; at this point, we must distinguish the 
two different information spaces with X, = (X, £) and X; = (X, 1). 


Definition 3.2 X and Y are two source state sets, and the random variables & and n 
are taken on X and Y, respectively; if & and 7 are compatible random variables, the 
probability distribution of joint event xy(x € X, y € Y) is defined as 

pay) = P$ -x,n— y}, Vx eX, yey. (3.4) 


Then, we call the joint event set 
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XY = {xy|x e X, y eY] 


Together with the corresponding random variables & and ņ, itis called the product 
space of information space (X, &) and (Y, 7), denote as (XY, £, n), when £ and n 
are clear, they can be abbreviated as XY = (XY, £, n). If X = Y are two identical 
source state sets, € and 7 have the same probability distribution, then the product 
space XY is denoted as X? and is called a power space. 


Since the information space is a complete set of events, defined by the product 
information space, we have the following full probability formula and probability 
product formula: 

9:003) = py), Vy eY 


xeX 


X pay) = p(x), Vx EX. 


yeY 


(3.5) 


And 
pP(x)p(y|x) = py), Vx e X, ye Y. 


Where p(y|x) is the conditional probability of y under the condition of x. 


Definition 3.3 Let X,, X2,..., X,(n > 2) be n source state sets, £j, &,...,&, be 
n compatible random variables with values, respectively, in X;, the probability dis- 
tribution of joint event x; x2 - - - x, is 


PiX: Xn) = PLE = xi, & = X2,.--, En = Xn}. (3.6) 
Then called 
XiXo- X, = lxi xx; e Xi, 1x i <n} 
are the product of n information spaces, especially when X, = X; =---= X, = X, 


and each &; has the same probability distribution on X, define X" = X,X2--- Xn, 
called the n-th power space of information space X. 


Let us give some classic examples of information space. 


Example 3.1 (Two point information space with parameter X) Let X = (0, 1} = FF; 
be a binary finite field, the random variable & taken on X is subject to the two-point 
distribution with parameter A, that is 


pO) = P(£ 20) =A, 
p) = P(g = 1} 21-3. 


where 0 < à < 1, then (X, &) is called a two-point information space with parameter 
À, still denote as X. 
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Example 3.2 (Equal probability information space) Let X = {x1, x2, ..., Xn} bea 
source state sets, the random variable £ on X obeys the equal probability distribution, 


that is i 
D(x) = P{é =x} = —, VxeX. 
|X| 


Then (X, &) is called equal probability information space, still denote as X. 


Example 3.3 (Bernoulli information space) Let Xo = (0, 1} = F2. Let the random 
variable é; be the i-th Bernoulli test; therefore, (5;]7.., is a set of independent and 
identically distributed random variables. We let the product space 


X = (Xo, &)(Xo, &2) +++ (Xo, En) = Xo C F5, 


the power space X is called Bernoulli information space, also alled memoryless 
binary information space. The probability function p(x) in X is 


p(x) = pOaxa Xn) = | [ Pæ), xi 200r1. (3.7) 


i=l 
where p(0) =A, p(1) =1-A. 


Example 3.4 (Degenerate information space) If X = {x}, it contains only one char- 
acter. X is called a degenerate information space, or trivial information space. The 
random variable & takes the value x of probability 1, that is P{Ẹ = x} = 1. At this 
time, € is a random variable with degenerate distribution in probability. 


Definition 3.4 Let X = (x1, X2, ..., Xn} be a source state sets, if X is an information 
space, the information entropy H (X) of X is defined as 


H(X) = — > p(x) log p(x) = — È p(s) log pai), (3.8) 


xeX i=1 


if p(x;) = 0 in the above formula, we agreed that p(x;) log p(x;) = 0, the base 
of logarithm can be selected arbitrarily; if the base of the logarithm is D(D > 2), 
then H(X) is called D-ary entropy, sometimes denote as Hp(X). 


Theorem 3.1 For any information space X, always have 
0 < H(X) x log |X|. (3.9) 


And H(X) = 0 if and only if X is a degenerate information space, H(X) = log |X| 
if and only if X is a equal probability information space. 


Proof H(X) > 0 is trivial. We only prove the inequality on the right of Eq. (3.9). 
Because f(x) = log x is a strictly convex real value, from the Lemma 1.7 in Chap. 1, 


thake g(x) — E is a positive function, p(x) > 0, thus let X = (xi, x5, ..., Xm} 
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m 1 m pGi) 
A(X) = 5» p(x;) log < log = logm. 
2. p) » px; 
The above equal sign holds if and only if p(xi) = p(x2) =--- = pm) = La that 


is, X is equal probability information space. If X = {x} is a degenerate infor- 
mation space, because p(x) = 1, so H(X) — 0. Conversely, if H(X) = 0, let 
X = (xi, x2, ..., Xm}, suppose d x; € X, such that 0 < p(x;) < 1, then 


x H(X). 


0 < p(x;)log < 
P(xi) 


So there is p(x;) = 1, but p(x;) = O(j z i); at this time, X degenerates into X = 
(xi), which is a trivial information space, the Lemma holds. 


An information space is a dynamic code (which changes with the change of the 
random variable on it). For “dynamic code", that is, the code rate of information space 
X, Shannon replaces iH (X) with information entropy, so information entropy H (X) 
becomes the first mathematical quantity to describe dynamic code. From Theorem 
3.1, when the code is degenerate, the minimum rate of a dynamic code is 0, when 
the code is equal probability, the maximum rate is the rate of the usual static code. 

Next, we discuss the information entropy of several typical information spaces. 


Example 3.5 (i) Let X be the two-point information space of parameter i, then 
H(X) = —Aloga — (1—2)log(1 — A) = HQ). 


H (4) we defined it in Chap. 1, it was called binary information entropy function at 
that time. Now we know why it is called entropy function 


(ii) X = {x} is degraded information space, then H(X) = 0. 
(iii) When X is equal overview information space, then H (X) = log |X|. 


Remark Most authors directly regard a random variable as an information space. 
Mathematically, it is convenient to do so and call it the information measurement of 
random variables. However, from the perspective of information, using the concept 
of information space can better understand and simplify Shannon's theory; the core 
idea of this theory is the random measurement of information, not the information 
measurement of random variables. 


3.2 Joint Entropy, Conditional Entropy, Mutual 
Information 


Definition 3.5 Let X, Y be two information spaces, and £, 7 be random variables 
with corresponding values, respectively. If E and are independent random variables, 
that is 
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P{§ =x,n=y}= P{E =x}-Pin=y}, VxeX, yey. 


X and Y are called independent information space, and the probability distribution 
of joint events is 
p(xy) = pG)pQ), Vx e X, y e Y. 


Definition 3.6 Let X, Y betwo information spaces, the information entropy H (XY) 
of the product space XY is called the joint entropy of X and Y, that is 


H(XY) 2 - 33M play) log py). (3.10) 


xe€X yeY 


The conditional entropy H(X|Y) of X versus Y is defined as 


H(X|Y) 2 — 3 5 | pay) log ply). (3.11) 


xeX yeY 


Lemma 3.2 (Addition formula of entropy) For any two information spaces X and 
Y, then we have 


H(XY) = A(X)+ H(Y|X) = H(Y) 4 H(X|Y). 
Generally, for n information spaces X1, X», ..., Xn, we have 
HQGX) + Xn) = 3 LHOGUXiaXia s X). (3.12) 
i=l 


Proof By (3.10) and probability multiplication formula, 


H(XY) 2 - 3 3 | play) log pay) 


xeX yeY 


=— Y plxy) dog p(x) + log p(y|x)) 


xex yeY 
= — Y p(x) log p(x) + HQ'|X) 
xeXx 
= H(X)+ H(Y|X). 
The same can be proved 


H(XY) = H(Y) + H(X|Y). 


We prove (3.12) by induction, when n = 2, 
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H(X1X5) = H(X1) + H(X2|X). 
The proposition is true, and for general n, we have 


H(X1X» Te Xn) = H(X1X» sm Xn-1) + A(X), |X X2 ii Xn-1) 
n—-1 
= ÑO H (Xil Xi Xi X1) HOG|XiXà Xn) 


i=1 
n 


= > H(Xi|Xi-1Xi-2:-- Xy). 


i=l 
The Lemma 3.2 holds. 


Theorem 3.2 We have 
H(XY) x H(X)+ H(Y). (3.13) 


If and only if X and Y are statistically independent information spaces, 


H(XY) = H(X) 4t H(Y). (3.14) 
Generally, we have 
H(X1X5--- Xn) S H(X1) + H(X2) +--+ (Xa). (3.15) 
If and only if X1, X5, ..., X, is an independent random process, 
H(Xi1X5--- Xn) = A(X) + H(X5) +--+ H(X,). (3.16) 


Proof By definition and Jensen inequality, we have 


xex yeY E 
< log) 5$ pGOpQ) 
xeX yeY 
=0. 


The above equal sign holds, if and only if for all x € X, y € Y, ? n = = = c( where 


c is aconstant), thus p(x) p(y) = cp(xy). Both sides sum at the same time, we have 


1=) pr} po) =e) 9 pay), 


xeXx yeY xex yeY 
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thusc = 1, p(xy) = p(x) p(y).Soifand only if X and Y are independent information 
spaces, (3.14) holds. By induction, we have (3.15) and (3.16). Theorem 3.2 holds. 


By (3.15), we have the following direct corollary; for any information space X 
and n > 1, we have 
H(X”) < nH (X). (3.17) 


Definition 3.7 Let X and Y be two information spaces, and say that X is completely 
determined by Y, if there is always a subset Ny C Y of Y for any given x € X, satisfies 


pP(xly)=1, ify € Ny; 


ply) =0, ify ¢ Nx. za 


With regard to conditional information entropy H (X|Y), we have the following 
two important special cases. 


Lemma3.3 (i) 0 x H(X|Y) x H(X). 
(ii) If the information space X is completely determined by Y, then 


H(X|Y) =0. (3.19) 
(iii) If X and Y are two separate information spaces, 
H(X|Y) = H(X). (3.20) 


Proof (i) is trivial. Let us prove (3.19) first. By Definition 3.7 and (3.18), for given 
x € X, we have 
P(xy) = p(y) ply) = 0, y ¢ Nx. 


Thus 
H(X|Y) 2 - 33 | p(xy) log p(y) 


xeX yeY 


=- 3 p@y)log ply) = 0. 


xeX yeN, 


The proof of the formula (3.20) is obvious. Because X and Y are independent, the 
conditional probability 


ply = p(x), VxeX,y cy. 


Thus 
H(X|Y) 2 - 3,5 | pGopOn log pa) 


xeX yeY 


=— X p(x) log p(x) = H(X). 


xeX 
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The Lemma 3.3 holds. 
Next, we define the mutual information / (X, Y) of two information spaces X and 


Y. 


Definition 3.8 Let X and Y be two information spaces, and then their mutual infor- 
mation J (X, Y) is defined as 


1X, Y) = 31 Y pay) log? 


ed» (3.21) 
xeX yeY P 


(x) 


x| 
x 


From the multiplication formula of probability, for all x € X, y € Y, 
p)pGlx) = po)pGly) = py). 


We have 
P(xly) — polo 


p(x) p) 


Therefore, there is a direct conclusion from the definition of mutual information 
I(X,Y) 

I(X, Y) = I (Y, X). 
Lemma 3.4 


I(X, Y) = H(X) - H(X|Y) = H (Y) — H (Y| X). 


Proof By definition, 


I(X,Y)= Ax» p(xy)log py) 


xex yeY px) 
= Y pGylog poly) - > SS py) log p(x) 
ZEX yeY xeX yeY 
= —-H(X|Y) — Y p(x) log p(x) 
xeXxX 


= H(X) — H(X|Y). 
The same can be proved 
I(X,Y) = H(Y) — H(Y|X). 


Lemma 3.5 Assuming that X and Y are two information spaces, I (X, Y) is the 
amount of mutual information, then 


H(XY) = H(X) + H(Y) — I (X,Y). (3.22) 
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Further, we have I (X, Y) > 0, ifand only if X and Y are independent, I (X, Y) = 0. 


Proof By the addition formula of Lemma 3.2, 


H(XY) = H(X) + H(Y|X) 
= H(X) + H(Y) — (H(Y) — H(Y|X)) 
= H(X) + H(Y) - I(X, Y). 


The conclusion about / (X, Y) > 0 can be deduced directly from Theorem 3.2. 


Let us prove an equation about entropy commonly used in the statistical analysis 
of cryptography. 


Theorem 3.3 /f X, Y, Z are three information spaces, then 


H(XY|Z) = H(X|Z) + H(Y|XZ) 


(3.23) 
= H(Y|Z)+ H(X|Y Z). 
Proof By the definition, we have 
H(XY|Z)=— 31 9 p(xyz) log p(xy|z). 
x€X yeY zeZ 
By probability product formula, 
p(xyz) = p(z)p(xylz) = p(xz)pGylxz). 
ii (xz) p(ylxz) 
P(xz) p(y|xz 
p(xylz) = ———— = p(x |z) p(y xz). 
p(z) 
So we have 
H(XY|Z) 2 - 359 3L pGyadog plz) plz) 
x€X yeY zeZ 
=— > Y pyd log plz) + log p(ylxz) 
x€X yeY zeZ 
=~ 1» pz) log palz) — 9 9 92 pey) log porlz) 
xEX zeZ xeEX yeY zeZ 


= A(X|Z)+ H(Y|XZ). 
Similarly, the second formula can be proved. 


Finally, we extend the formula (3.15) to conditional entropy. 
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Lemma 3.6 Let X4, X2,..., Xn, Y be information spaces, then we have 
A(X,X2---X,|Y) < A(X |Y)+---+ H(X,|Y). (3.24) 
Specially, when X, = X2 =--- =X, =X, 
H(X"|Y) x nH(X|Y). (3.25) 


Proof We make an induction of n. The proposition is trivial when n = 1. Let the 
proposition be true when n, i.e., 


H(X1X5--- X,|Y) € H(Xi|Y) +--+ H(X,|Y). 
Then when n + 1, we let X = X4X5--- Xn, then 


H(X1X5::: Xy i]Y) = H(X Xn lY) 


--» DO Dd ptazy) log pGzly). 


x€X zeX,41 yeY 


From the full probability formula, 


H(X|Y) + HOGlY) 2 > 3, YO p(zy)log ply) ply). 


x€X zeX,41 yeY 


So by Jensen inequality, 
H(XXijilY) — H(X|Y) — H (Xn lY) 


=) Y YS pzy) log => Ply) ply) 


x€X zeX,41 yeY p(x xz|y) 


xlog?^ $5 >) pO) PGly)p&ly). 


x€X ZEXn41 VEY 


By product formula 


= 9» 5 P(y)P@ly) ply) 


x€X ZEXn+1 yeY 


252930522720 


xe€X yeY 


= paye 


xeX 


So by the inductive hypothesis, 
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H(XXy4ilY) < H(Xs4]Y) + H(X|Y) 
< H(XilY) + H(X2|Y) E + H(X, lY). 


The proposition holds for n + 1. So the Lemma holds. 


3.3 Redundancy 


Select a alphabet IF, or a remaining class ring Zm of module m, each element in the 
alphabet is called character, and in the field of communication, alphabet is also called 
source state, and character is also called transmission signal. If the length of a q-ary 
code is increased, redundant transmission signals or characters will appear in each 
codeword. The digital measurement of “redundant characters" is called redundancy, 
which is a technical means to improve the accuracy of codeword transmission, and 
redundancy is an important mathematical quantity to describe this technical means. 
Therefore, we start by proving the following lemma. 


Lemma 3.7 Let X, Y, Z be three information spaces, then 
H(X|Y Z) x H(X|Z). (3.26) 


Proof By total probability formula, 


H(X|Z) =— 3 p(xz) log p(xlz) 


x€X zeZ 


-—-»3 Y. p@yz)log pota. 


x€X zeZ yeY 


So 
H(X|YZ) — H(X|Z) 


= p(x|z) 
- Y poy) log FC 


x€X yeY zez 


< log y ^ y eS P(xyz) p(x|z) 


xEX yeY zeZ pGlzy) 


=log »» poopGlo 


x€X ycY zez 


=log J p@ p(xlz) 


xEX zeZ 


=0. 


Thus H(X|Y Z) x H(X|Z). The Lemma holds. 
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Let X be a source state set and randomly select codewords to enter the channel of 
information transmission is a discrete random process. This mathematical model can 
be constructed and studied on X by taking the value of a group of random variables 
{& }ix1. Firstly, we assume that {&;};~; obeys the same probability distribution when 
taking value on X, and we get a set of information spaces (X i }>1, let Ho = log |X| 
be the entropy of X as the equal probability information space, for n > 1, we let 


H, = H(X|X""'), H; = H(X). 


By Lemma 3.7, then {H,,} constitutes a number sequence with monotonic descent 
and lower bound, so that its limit exists, that is 


lim H, =a (a> 0). (3.27) 


n— oo 


We will extend the above observation to the general case: Let (5;);-; be any set of 
random variables valued on X, for any n > 1, we let 


Xn = (X, En), n= 1. 


Definition 3.9 A source state set X has a set of random variables {é;};>ı valued on 
X, then X is called a source. 


(i) If(5j]is1isa group of independent and identically distributed random variables, 
X is called a memoryless source. 
(i) If for any integers k, tı, t2, ..., tg and h, random vector 


G , £p, ttt En) (enn, Eh. ttt; Eh) 


obey the same joint probability distribution, then X is called a stationary source. 
(ii) If {&;};>1 is a k-order Markov process, that is, for V m > k > 1, 


Dn |Xm—1Xm—2 SS xı) 


= P(Xm|Xm—1Xm—2 o Xni). V X1, X2,..., Xm € X, 
Then X is called k-order Markov source, specially, k = 1, i.e., 
P(Xm|Xm—1Xm—2 sg +x) = P(Xm|Xm-1), Vx, X2,..., Xm € X, 


call X Markov source. 


The concept from information space to source changes from a single random 
variable taking value on X to an infinite dimensional random vector, so that the 
transmission process of code X constitutes a discrete random process. By definition, 
we have 
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Lemma 3.8 Let X be a source state set, and (&i);-1 be a set of random variables 
valued on X, we write 
Xj = (X, 8), i z 1. (3.28) 


(i) If X is a memoryless source, the joint probability distribution on X satisfies 
n 
po Xn) = | [ POD, x € Xi, n= 1. (3.29) 
i-l 


(ii) If X is a stationary source, then for all integers ti, t5, ..., ty(k > 1) and h, there 
is the following joint probability distribution, 


p, Xn: Xy) = P(X 4hXn+h yt Xn eh); (3.30) 


where x; € Xj, i> 1. 
(iii) If X is a stationary Markov source, then the conditional probability distribution 
on X satisfies for any m > land xix5:-- Xy € X1X2--+ Xm, we have 


DGm|X1 :** Xy-1) = PGmnlXxn-1i) 


l (3.31) 
= Pléi41 = Xml& =Xm-1}, VI <Si<m-—1. 


Proof (i) and (ii) can be derived directly from the definition. We only prove (iii). By 
(ii) of the definition 3.9, for Vi > 1, we have 


PE; = Xm—1; &i+1 = Xm} = PE n—1 = Xm—1: En = Xm} 


and 
PE; = Xm—1} = PLEm—1 = Xm—1}- 

Thus 

PLE, = Xm-1}P {i41 = XmlEi = Xn-i) 

= Pini = Xm—1} P {Em = XmlEm-i = Xm-1}- 
We have 

P {&i41 = Xm |&i = Xm—1} = P GOnlXn-1)- 

The Lemma holds. 


Corollary 3.1 A memoryless source X must be a stationary source. 
Proof Derived directly from Definition 3.9. 


Next, we extend the limit formula in memoryless sources revealed by formula 
(3.27) to general stationary sources. For this purpose, we first prove two lemmas. 
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Lemma 3.9 Let ( f (n)),z1 be a sequence of real numbers, which satisfies the fol- 
lowing semi countable additivity, 


fim) «x fa) fim), Vn>1,m>1. 


1 
Then lim — f (n) exists, and 
noon 


lim l Fn) — inf ROL > ] . (3.32) 
noon n 


Proof Let 
1 
ZIEL 2 |] , ô £ —oo. 
n 


For any & > 0, select a sufficiently large positive integer m so that 
1 E 
— f(m) « à 4 —. 
m 2 


Letn = am + b, where a is an integer, 0 < b < m, by semi countable additivity, we 
have 


f (n) < af (m) + (n — am) f (1). 


Divide n on both sides, we have 


1 a b 
= < 1). 
eA) ce Gm ) 
For given b, when m is large enough, we have 
bfa) 1 
< =E. 
am+b 2 
So there is i i i 
—f(n) « —f(m)+ =e <e+6. (3.33) 
n m 2 


Thus we have 1 1 
ô < lim —f(n) < lim —f(n) < ô+ e. 
n non 


n—oo 


So i 
lim — f (n) =6. 
noon 


If ô = —oo, by (3.33), 
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Um HG: 


n>on 
so we still have i 
lim — f(n) = ó = —ooc. 
noon 
The Lemma holds. 
Lemma 3.10 Let {an}n>1 bea sequence of real numbers, and the limit lim a, = a, 
n— oo 
then 
1 n 
]im z yr =a. 
i=l 
Proof 
ion [4 i 
a acm ! (aj — a) <72 la - 0 
i-l i=1 i=1 
N n 
1 1 
= »» aj — a| + 3 la; —a 
sd BENT 
N 
1 n—WN 
< » = di a 5 € 
d 
« " 37 dj —a|- e. 


When e > O is given, Nis also given accordingly, the first item of the above formula 
tends to 0, when n — oo. So for any £ > 0, when n > No, 


n 
1 
v ) a; —a 
n< 

i=1 


< 2e. 


Thus there is 


The Lemma holds. 
With the above preparations, we now give the main results of this section. 
Theorem 3.4 Let X be any source, {&}j>1 is a set of random variables valued on 


X. For any positive integer n > 1, let 


Xn —(X,&), nèl. 
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Then when X is a stationary source, we have the following two limits that exist and 
are equal, that is 


1 
lim -H(XiX3... X,) = lim H(X,|X1X3 ... Xs 4). 
noo 


n>o n 


We denote the above common limit as Ho.(X). 


Proof Because X is a stationary source, for any n > 1, m > 1, then the joint event 


probability distribution of random vector (5,41, En+2, - - - , £y 4) on X is equal to the 
joint probability distribution of random vector (5, £5, . . . , Em); therefore, we have 
H(X1X»* Xm) = A(Xn41Xn42° Xs). (3.34) 


By Theorem 3.2, then 


H(X1X»; 3 ED XnXn+ı Ut -Xnim) < H(X: in X4) + H(Xn+1 n Xia) 
= A(X, +++ Xn) + A(X) +++ X4). 


Let f (n) = H(X--- Xn), then f(n +m) < f(n) + f im), so (f (n)]nz1 is a non- 


negative real number sequence with semi countable additive property, by Lemma 
3.9, we have 


n—oon 


1 1 
lim -H(X4X5--- X,) = inf [rox e XIn > i} > 0. 
n 


Next, we prove that there is a second limit, that is 


lim H(X,|X4X»5--- X, j)exist. 


n— oo 


Firstly, we prove that the sequence is monotonically decreasing, because X is a 
stationary Source, so 


H(X4X5--- X, .1) = H(XoXs--- Xn) 


and 
H(X2X3 ERS XnXn+1) = H(XıX2 hd Xn). 
So we have 
A(Xn41|X2X3 EE Xn) = A(X, |X) X2 NUN Xn-1). (3.35) 
By Lemma 3.7, 


H(Xn1|X1X2 +: Xn) < H(Xn1|X2X3 Xn) 
= H(Xa|X1X2 +" Xn-1). 
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So {H(X,|X1X2--+ Xn—-1)}n>1 is a monotonically decreasing sequence and has a 
lower bound, so lim H(X,|X, X»--- X41) exist. Further, by the addition formula 
n— oo 


of Lemma 3.2, 


1 1 n 
—A(X,X2--- Xn) = — ] H(X;|X1X»5--- Xj-1). 
n n 


i=l 


By Lemma 3.10, finally we have 


1 
lim -H(X1X2--- Xa) = lim H(X,|X1X2--* X, 4) = H(X). 
n noo 


n— oo 


We completed the proof of the Theorem. 


We call H,.(X) the entropy rate of source X. obviously, there is the following 
corollary. 


Corollary 3.2 (i) For any stationary source X, we have 
Ho(X) < H(X1) € log |X]. 
(ii) If X is a memoryless source, then 
A(X) = H(X)). 

(iii) If X is a stationary Markov source, then 

H(X) = H(X2|X1). 
Proof Since {H(X,|X1 +--+ Xn—1)}n>1 is a monotonically decreasing sequence, then 

Hy (X) < H(X). 
That is, (i) holds. If X is a memoryless source, then 


A(X) +++ Xn) = — p» Tr » p(X1X2 +++ Xp) log p(xixo >+- Xn) 


x, EX XnEXn 


=— x € p» pa ...x,) {log pGxi) + -++ + log p(xn)} 


X1€ Xi XnEXn 


= nH(X)). 


So we have 
Ho(X) = H(X|). 


Similarly, we can prove (iii). 
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Definition 3.10 Let X be a stationary source, we define 


HX 


ô —log|X| — Ho (X), r = 1— ——, 
og |X| oo (X). r log |X| 


(3.36) 


6 is the redundancy of information space X, and r is the relative redundancy of X. 


We write 
Ho = log |X|, Hn = H(X,|X1Xo--- X41), Vn = 1. 
By Theorem 3.4, we have H44(X) = Hp < H,, so 
H, > (1—r)Ho, Vn 7 1. (3.37) 


In information theory, redundancy is used to describe the effectiveness of the 
information carried by the source output symbol. The smaller the redundancy, the 
higher the effectiveness of the information carried by the source output symbol, and 
vice versa. 


3.4 Markov Chain 


Let X, Y, Z be three information spaces, if there is the following conditional proba- 
bility formula 


pGxylz) = p@lz)pQlz). (3.38) 
Say that X and Y are statistically independent under the given condition of Z. 


Definition 3.11 Ifthe information space X and Y are statistically independent under 
condition Z, X, Y, Z is called a Markov chain, denote as X — Z — Y. 


Theorem 3.5 X — Z— Y is a Markov chain if and only if the probability of occur- 
rence of the joint event xzy is 


pGzy) = pGe)pGbop(lz). (3.39) 


if and only if 
pGzy) = p(y) ply) plz). (3.40) 


Proof If X — Z — Y isa Markov chain, then p(xy|z) = p(x|z) p(y|z), thus 


p(zy) = p(z)p(xy|z) 
= p(z)p(x|z) plz) 
= p(x) p(z|x) pz). 
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Similarly, 
p(zy) = pG)pOla2pGlz) 
= p(y) p(zly) plz). 


That is (3.39) and (3.40) holds. Conversely, if (3.39) holds, then 
p(xzy) = px) p(z|x) plz) 
= p(z)p(x|z) plz). 


On the other hand, the product formula 


p(xzy) = p(z)p(xylz). 


So we have 
P(xylz) = p(xlz)pOlz). 


Thatis X — Z — Y is a Markov chain. Similarly, if (3.40) holds, then X > Z — Y 
also is a Markov chain. The Theorem holds. 


According to the above Theorem, or by Definition 3.11, obviously, if X > Z — 
Y is a Markov chain, then Y — Z — X is also a Markov chain. 


Definition 3.12 Let U, X, Z, Y be four information spaces, and the probability of 
joint event uxzy is 


p(uxzy) = p(u)p(x|u)p(z|x)pCylz). (3.41) 
Call U, X, Z, Y a Markov chain, denote as U — X — Z — Y. 


Theorem 3.6 If U —^X— Z-—Y is a Markov chain, then U — X — Z and 
U — Z — Y are also Markov chains. 


Proof Assuming that U — X — Z — Y is a Markov chain, then 
p(uxzy) = pu) p(x|u) p|x) pGCylz). 
Both sides sum y € Y at the same time, and notice that Dover PQ|z) = 1, then 
p(uxz) = pu) p(x|u) p(z|x). 


By Theorem 3.5, U — X — Z is a Markov chain. The left side of the above formula 
can be expressed as 
p(uxz) = p(ux) p(z|ux). 


So we have 
pGlux) = p(z|x). 


112 3 Shannon Theory 
Because U —> X — Z — Y isa Markov chain, then 


p(uxzy) = pGopG|u)p(|x)pCylz) 
= p(ux)p(z|ux)p(ylz) 
= p(uxz)p(ylz). 


Both sides sum x € X at the same time, then we have 
p(uzy) = p(uz)pCylz) 
= p(u)p(zlu)ptylz). 


Thus U — Z — Y is also a Markov chain. The Theorem holds. 


In the previous section, we defined the mutual information J (X, Y) of two infor- 
mation spaces X and Y as 


1G Y) =F Y pay) log LE 


tex ycY poop) 


Now we define the mutual information 7 (X, Y |Z) of X and Y under condition Z as 


p(xyl|z) 
I(X,Y|Z lo 3.42 
(X, YIZ) = »»I EIS Ga) 


By definition, we have 
I(X,Y|Z) = I(Y, X|Z). (3.43) 


I(X, Y|Z) is called the conditional mutual information of X and Y. 
For conditional mutual information, we first prove the following formula. 
Theorem 3.7 Let X, Y, Z be three information spaces, then 
I(X,Y|Z) = H(X|Z) — H(X|YZ) (3.44) 


and 
I(X,Y|Z) = H(Y|Z) — H(Y|XZ). (3.45) 


Proof We only prove (3.44), the same is true for equation (3.45). Because 


H(X|Z) — H(X|YZ) = Y/Y pGxyo log pGlyz) 


xex yeY zeZ px Iz ) 


= Y Y perye) tog AO — 


x€X yeY zeZ plz plz) 
= I(X,Y|Z). 


So (3.44) holds. 
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Corollary 3.3 We have I(X, Y|Z) = 0, if and only if X > Z — Y is a Markov 


chain I (X, Y|Z) = 0. 
Proof By Theorem 3.7, 

I(X, Y|Z) = H(X|Z) — H(X|YZ) = 0. 
If X — Z — Y is a Markov chain, by (3.42), 


P(xy|z) 
——————— = log1=0, 
palopok) $ 


thatis 7(X, Y|Z) = 0. Vice versa. 


Conditional mutual information can be used to establish the addition formula of 


mutual information. 


Corollary 3.4 (Addition formula of mutual information) Jf X4, X», 
information spaces, then 


n 
IQ X2 Xn Y) = $ I0G.Y|Xia X1). 
i=l 
Specially, when n = 2, we have 
I(X1X5, Y) = I(X1, Y) + I (X2, Y|X1). 


Proof By Lemma 3.4, we have 


I(X1X5--- Xn, Y) = H(X1X5--- Xn) - H(X1X5--- X,]Y) 


..., Xn, Y are 


(3.46) 


(3.47) 


n n 
= M HOGIXCas X) - 3 HOGIXCa s XY). 
i-l 


i-l 
Again by the chain rule of conditional entropy to get 
n 
I(XiX? X4, Y) = 3 IG. Y|XiXo Xi). 
i-l 


Therefore, the corollary holds. 


Finally, we use Markov chain to prove the inequality of mutual information. 


Theorem 3.8 Suppose X — Z — Y is a Markov chain, then we have 


I(X,Y) x I(X, Z) 


(3.48) 
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and 
I(X,Y) x I(Y, Z). (3.49) 


Proof We only prove (3.48), the same is true for equation (3.49). From equation 
(3.47) and corollary 3.3: 


I(YZ, X) 2 I(Y, X) - I(X, Z|Y). 
Thus we have 
I(X, Y) = I(X,YZ) — I(X, Z|Y) 
< I(X,YZ) 
= 1(X, Z)+ I(X,Y|Z) 
= 1(X,Z). 
In the last step, we use the Markov chain condition, thus 7 (X, Y|Z) = 0. The The- 
orem holds. 


Theorem 3.9 (Data processing inequality) Suppose U — X — Y — V isaMarkov 
chain, then we have 
I(U, V) x I(X, Y). 


Proof According to the conditions, U — X — Y and U — Y — V is a Markov 
chain, respectively, by Theorem 3.8, 


I(U, Y) < I(X, Y) 


and 

I(U, V) x I(U, Y). 
Thus 

I(U, V) x I(X, Y). 
The Theorem holds. 


3.5 Source Coding Theorem 


The information coding theory is usually divided into two parts: channel coding 
and source coding. The so-called channel coding is to ensure the success rate of 
decoding by increasing the length of codewords. Channel coding, also known as 
error correction code, is discussed in detail in Chap. 2. Source coding is to compress 
the data with redundant information to improve the success rate of decoding and 
recovery after information or data is stored. Another important result of Shannon's 
theory is that there are so-called good codes in source coding, which is characterized 
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by fewer codewords as much as possible. To improve the storage space efficiency, 
and the error of decoding and restoration can be arbitrarily small. Source coding is 
also called typical code. Shannon first proved the asymptotic bisection property of 
‘block code’for memoryless source, and drew the statistical characteristics of typical 
code from now on. At Shannon’s suggestion, McMillan (1953) and Breiman (1957) 
also proved a similar asymptotic bisection property for stationary ergodic sources. 
This is the very famous Shannon-McMillan-Breiman theorem in source coding, 
which constitutes the core content of modern typical code theory. The main purpose 
of this section is to strictly prove the asymptotic bisection of memoryless sources, so 
as to derive the source coding theorem for data compression (see Theorem 3.10). For 
the more general Shannon-McMillan-Breiman theorem, Chap. 2 of Ye Zhongxing's 
fundamentals of information theory (see Zhongxing, 2003 in reference 3) gives a 
proof under the condition of stationary ergodic Markov source, interested readers 
can refer to it or refer to more original documents (see McMillan, 1953; Moy, 1961; 
Shannon, 1959 in reference 3). 

Firstly, let X = (X, £) be an information space, and the entropy H(X) of X 
essentially depends only on the probability function p(x)(x € X) of random variable 
&. We can define the random variable taking value on X according to p(x). 


m = P(X), m = log p(X). (3.50) 
The probability function is 
Pín; value x} = P{m value x} = p(x). (3.51) 
It is easy to see the expected value of 72 


—E(n2) = —E(log p(X)) 
= —»  polog p(x) = H(X). (3.52) 


xeX 


Therefore, we can regard the entropy H (X) of X as the mathematical expectation of 
random variable log rise 


Lemma 3.11 Let X be a memoryless source, p(X") and log p(X") be two random 
variables whose values are on the power space X", then -i log p(X") converges to 
H (X) according to probability, that is 


1 
— log p(X") © H(X). 
n 


Proof Since X is a memoryless source, {&;};>1 is a group of independent and identi- 
cally distributed random variables, X; = (X, &;)(i > 1), X” = X1X5--- X,(n = 1) 
is a power space, then there is 
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P(X") = p(X)p(X2) +++ p(X,) 
log p(X") = »7; log p(X;). 
Because {&;};~ is independent and identically distributed, ( p(X”)} and (log p(X”} is 


also a group of independent and identically distributed random variables. According 
to Chebyshev’s law of large numbers (see Theorem 1.3 of Chap. 1), 


1 1 n 
—— log p(X") = — ð log 
n n »» p(X;) 


converges to the common expected value H (X), that is 


1 1 
E | log —— | = E | log —— | = A(X). 
(es ss) (vez) S 


For any € > 0, for any codeword x = x1x5--- x, € X”, there is 
1 n 
P{| — —log p(X") — H(X)| < e} > 1— e. (3.53) 
n 


The proof is completed. 


Definition 3.13 Let X be a memoryless source, power space X", also known as 
block code, 
X” = {x = xe Xaxi EX, 1<i<nj,n>1. (3.54) 


For any given € > 0, n > 1, we define a typical code or a typical sequence W in 
the power space X” as 


1 
W(? = {x = x1 + Xn | | — — log p(x) — H(X)| < £}. (3.55) 
n 
Because the definition, and € > 0, n > 1, we have 
W? cx", |ix^| = |X|". (3.56) 


Lemma 3.12 (Progressive bisection) | W( | represents the number of codewords in 
typical code WC", then for any £ > 0, in binary channels, we have 


a =, g)2^(H00-9) < IW < QnA (X)+e) (3.57) 


Proof By Lemma 3.11 and (3.53), then for any x € X", we have 


P{|- “Jog p(x) — H) <eļ}> l-e. 
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In other words, for all codewords x = x1x2 +++ x, € Wi”, we have 
1 
H(X) —e < —-— log p(x) « H(X) 4 &. 
n 
Equivalent in binary channel, 
2-n((HCO8) < p(x) < FOLATE, (3.58) 
Denote the probability of occurrence of W(? as P(Wf], then 


P{W”) = P{x e X":xeW?) -1-«. 


On the other hand, 
PIW}= D> pe), 
xew 
by (3.58), 
\w| . 27H (X)+e) < P{W®} <1. 
So 


Iw, < gn (X)+e) 
Again by (3.58), there is 
WE 2700-9 > PW) s. p ug 


So we have 
IWP] > dene, 


Combined with the above inequalities on both sides, we have 
(H(X)—€) (n) (H(X)+8) 
ad MEN €)2” € «c Iw? | < 9r € . 


We completed the proof. 


By Lemma 3.12, for memoryless source X, the probability distribution p(x) of 
its power space X" is approximate to 


po)eg "40 wie 
The number of codewords |W/" | in typical code W;”” is approximately 


| wt? | AS 9n H(X) 
E . 
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Further analysis shows that the proportion of typical code W{” in block code X" is 
very small, which can be summarized as the following Lemma. 


Lemma 3.13 Fora sufficiently smalle > 0 given, when X is not an equal probability 
information space, we have 
[Wer] 


n—oo |X|” 


Proof By Lemma 3.12, we have 


[W] z QnA (X)+e) 
|X|” = |X|” 


Be (n) 
n 
|W; | < 2-nüog|X|-H(X)—&). 
|X|" md 


By Theorem 3.1, since X is not an equal probability information space, when € is 
sufficient, we have 
H(X) 4- e < log |X]. 


wei 
|X|" 


Therefore, when n is sufficiently large, the ratio of 
Lemma 3.13 holds. 


can be arbitrarily small. The 


Combining Lemmas 3.11, 3.12 and 3.13, we can describe that the typical codes 
in block codes have the following statistical characteristics. 


Corollary 3.5 Assuming that X is a memoryless source and the typical sequence 
(or typical code) W® in block code X" is defined by formula (3.55), then for any 
e€ >0,n > 1, we have 


(i) (Progressive bisection) 
- peter.) < IW] < 2H (X)+8) 
<|W; |< 
(ii) The occurrence probability P{W®} of WC is infinitely close to 1, that is 
P{W”) = Pix e X*:x ew?) >1-e. 


(iii) When X is not equal to almost information space, the proportion of W® in block 
code X" is any smaller, that is, 


LAM 


noo |X|” 


The above description of the statistical characteristics of typical codes is an impor- 
tant theoretical basis for source coding or data compression. Therefore, we find an 
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effective way to compress the packet code information, so that the rearranged code- 
words are as few as possible, and the error probability of decoding and recovery is 
as small as possible. An effective method is to divide the codeword in block code 
X" into two parts; the codeword of typical code W{” is uniformly numbered from 
1 to M. That is, the codeword in W{” forms one-to-one correspondence with the 
following positive integer set Z, 


I — (,2,..., M), M 2 |W). 


For codewords that do not belong to W/, we uniformly number them as 1: Obvi- 
ously, for i, i # 1, 1 <i < n, there is a unique codeword x” € W( in W(, so we 


; i . , decod y : ; 
can accurately restore i to x), that isi —> x is the correct decoding. For i = 1, 
we will not be able to decode correctly, resulting in decoding recovery error. We 
denote the code rate of the typical code W®™ as I log M, by Lemma 3.12, 


1-— g)2n 00-9 < M < 2r[dicove) 
Equivalently, 
log(1 — €) + n(H (X) — £) < log M < n(H(X) + £), 


Therefore, the bit rate of typical code wir 1s estimated as follows 
1 1 
—logd —«)+ H(X) — € € -logM x H(X) +e, (3.59) 
n n 
when 0 < € < 1 given, we have 
. 1 
H(X)—e x lim —log M x H(X) +e. 
noon 


In other words, the code rate is typically close to H (X). Let us look at the decoding 
error probability P, after this number, where 


P, = P[(x e X : x e Wf). 
Because 
P, + P{W} = 1, 
According to the statistical characteristics (ii) of the typical code W(^, 


P, —1— P{W} < e. (3.60) 
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From this, we derive the main result of this section, the so-called source coding 
theorem. 


Theorem 3.10 (Shannon, 1948) Assuming that X is a memoryless source, then 


(i) When the code rate R — 1 log Mı > H(X), there is an encoding with the code 
rate of R, so that when n — œ, the error probability of decoding recovery is 
P, — 0. 

(ii) When the code rate R = 1 log M; < H(X) — ô, ô > 0 and does not change with 
n —> œ, then any coding with R as the code rate has im p, — 1. 


Proof The above analysis has given the proof of (i). In fact, if 


1 
R = —log M, > H(X), 


m 
then when e is sufficiently small, by (3.59). Typical codes in block code X, are 
1 (n) (n) 
R- — log |W; |, Mi > |W, |. 
n 
Therefore, we construct a code C C X", which satisfies 
w® c c,|c| 2 Mi. 
Thus, the code rate of C is just equal to R, and the decoding error probability P.(C) 
after compression coding satisfies P,(C) < £. Because the probability of occurrence 
of C 
P(C] + P.(C) = 1. 


But 
P{C} > P{W} »1— s, 


(i) holds. To prove (ii), we note that, V x € W;”, then 
1 
|= 4 98 px) = H(X)| < e. 


The above formula contains V x € Wf, 
p(x) < 270-9, 
Thus, the probability of occurrence of W^ satisfies 


PW y= S^ pæ) s WP] ae. (3.61) 


xew 
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If we use R as the bit rate, because 
1 
R=-logM < H(X) — 6, 
n 
then we have 
jw < M = 2"\A(®)-S) 
2 . 


By (3.61), 
P{W} < 2759-89) (3.62) 


when 0 < € < 6, we have 
1 — P, = P{W} < e. 


Thus 
lim P, = 1, 


n—oo 


Thus the theorem holds. 


3.6 Optimal Code Theory 


Let X be a source state set, x = x1x5--- x, € X" be a message sequence, and x be 
output as a codeword u = u,u»5:-- Uz € ZE of length k after compression coding, 
where D > 1 is a positive integer, Zp is the remaining class ring of mod D, u = 
uju2:-- Uk € zt is called a D- ary codeword of length k. u is decoded and translated 
into message x, that is u — x. The purpose of source coding is to find a good 
coding scheme to make the code rate as small as possible under the requirement of 
sufficiently small decoding error. Below, we give the strict mathematical definitions 
of equal length code and variable length code. 


Definition 3.14 Let X be a source state set, Zp is the remaining class ring of mod D, 
n, k are positive integers. The mapping f : X" — Z% is called equal length code 


coding function; 25 ms X" is called the corresponding decoding function. For 
Vx =X Xn E€ X”, f(x) =u = u+- uk € Zk, u —u,---uy is called a code- 
word of length k. 

C = (f(x) € Z5Ix e x"), (3.63) 


call 

Call C is the code coded by f, and R — Flog D is the coding rate of f, also 
known as the code rate of C. C is called equal length code; it is sometimes called a 
block code with a packet length of k. 
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By Definition 3.14, the error probability of an equal length code coding scheme 


Cf. V) is 
P, = P(vCf(x) £x, xe X"). (3.64) 


Let us first consider error free coding, that is P, = 0. Obviously, P, = 0 if and 
only if f is a injection, y = f! is the left inverse mapping of f. select a coding 
function f : X" > Z as a injection if and only if VAN > [X"|, that is D* > N”, 
where N = |X|, take logarithms on both sides, 


R= — > logN = log |X|. (3.65) 
Therefore, the code rate of error free compression coding f is at least log, |X| bits 
or In | X | naits. 

We consider progressive error free coding, that is, for any given £ > 0, required 
decoding error probability P, < £. By Theorem 3.10, only the code rate R > H(X)is 
needed. In fact, take X as an information space and encode the n-lengthen message 
column x = x1x2---x, € X", ifx € wir is a typical sequence (typical code), x 
corresponds to a number in M = |W, if x  W®™ , uniformly code x as 1. If the 
M codewords in W/? are represented by D-ary digits, let D* = M (the insufficient 
part can be supplemented), and the code rate R is 


1 k 
R = —log M = - log D. 
n n 


Since M is approximately 2"/ C, R is approximately H (X), thatis R = + log M ~ 
H (X). From the asymptotic bisection, the error probability of such coding is 


P, = P{x = xix, € WS} < ©, When n is sufficiently large. 


However, in practical application, n cannot increase infinitely, which requires us to 
find the best coding scheme when given a finite n, so that the code rate is as close as 
possible to the theoretical value H (X). However, in application, we find that equal 
length code is not an efficient coding scheme, while variable length code is more 
practical. For example, 


Example 3.6 Let X = (1, 2, 3, 4} be an information space, and the probability dis- 
tribution of random variable & taking value on X is 


The entropy H (X) of information space X is 


1 1 1 1 1 1 1 1 . 
H(X) = 5 log, 2^4 log, 1783 log, 3 8 log, g 1.75bits. 
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If equal length code is used for coding, the code length is 2, and the code is 


Source letter |Codeword 


1 00 
2 01 
3 10 


4 11 
Then the code rate R(k = 2, n = 1) is 


R = 2log, 2 = 2 > 1.75bits. 
Obviously, the use efficiency of equal length codes is not high. If the above codes 


are replaced with unequal length codes, such as 
Source letter| Codeword 


1 0 

2 10 
3 110 
4 111 


We use / (x) to represent the code length after the source letter x is encoded, then the 
average code length L required for X encoding is 


L adlata sac ep S c ao RO 
= i ij — x x x x = 1. 1ts = s 
Du QUEM 4 8 8 


i=l 


It can be seen that using unequal length code to compile X has higher efficiency. 
This example also explains the following compression coding principle: for char- 
acters with high probability of occurrence, a shorter codeword is prepared, and for 
characters with low probability of occurrence, a longer codeword is prepared to 
ensure that the average coding length is as small as possible. 

Next, we give the mathematical definition of variable length coding. For this 
purpose, let X* and Z7, be the set of finite length sequences, respectively. That is 


X* = desees X4. 


Definition 3.15 (i) X” hy Zi is called a variable length code function, if any x € 
X". f (x) € Zh, When x is different, the code length of f(x) is not necessarily 
the same. We use /(x) to table the length of f(x), which is called the coding 
length of x. C = (f(x) e Zh |x e X"} is called variable length codeword set. 

(ii) Let f : X* — Zh be a amapping, call f is a coding mapping, f (X*) is called 
a code. 

(ii) f :X*— Zh is called a block code mapping, if there is a mapping g: 
X —>Z7,, so that for any x € X"(n > 1), write x = x1x2: +: Xn, there is f(x) = 
8(X1)g(X2) +++ 8 (Xn). 

(iv) f: X*—9 Zi, is called a uniquely decodable map, if f is a block code mapping 
and f is a injection. 
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(v) f : X* — Z5 is called a real-time code mapping. If f is a block code mapping, 
and for any x, y € X*, f(x) and f (y) cannot be prefixes to each other. 


Remark 3.1 a = aià5:- -an € Zh, b = bibo- -bm € Z5, call codeword a the pre- 
fix of b, if m > n, and for any 1 <i € n, there is a; = bj. 


Lemma 3.14 Block code mapping f : X* —9 Z% is called a uniquely decodable 
mapping if and only if for V n > 1, X" — Zh, f is restricted to a injection on X". 


Proof The necessity is obvious and the adequacy is proved. That is to prove for 
Vx-—xjxosox, € X", y = yy: ys € X", x Ay, there is f(x) Z f(y). Sup- 
pose there is f(x) = f(y), because f is a block code mapping, there is a mapping 
g: X— Zh, we have 


fG) = gaga) -- £69) = gvDgO2--: gU) = f). 


Then 
fy) = g(x) g(%2) ++ Bn) BONDE (2 8 Om) 


= g(y1)a(y2) e 80m) 8 (1) 8 (x2) «+» g6) 
= f(x). 


But xy Æ yx, this contradicts the fact that f is restricted to a injection on X"*”. 


Lemma 3.15 A real-time code is uniquely decodable, and vice versa. 


Proof Suppose f : X*—~ Z% as an instant code mapping, andforx, y € X*,x Æ y, 
there is f(x) = aiaz -+ -an € Zh, f(y) = bibo: b, € Z'y(m = n). Because f (x) 
is not a prefix of f(y), it exists i(1 < i < n), there is a; A b;, thus f(x) Z f(y), 
that is f is an injection. In turn, let us take a counter example, 


Source letter| Codeword 


1 0 
2 01 
3 011 
4 111 


where X = {1, 2, 3, 4} is the information space and f : X — Z7 is a variable 
length code. f (1) is the prefix of f (2), that is, f is not a real-time code map, but 
obviously f is the only decodeable map. The Lemma holds. 


What are the conditions for the code length of a real-time code? The following 
Kraft inequality gives a satisfactory answer. 


Lemma 3.16 For the uniquely decodable code C value in Zi, |C| = m, the code 
lengths are li, lo, . . ., lm, then there is the following McMillan-Kraft inequality. 


by ox (3.66) 
i-l 
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On the contrary, if l; satisfies the above conditions, there is a code length set of 
real-time code C such that (li, l2, ..., lm} is C. 


Proof Consider 


p D=)" = (DA + DB +--+ p=) ; 
i=1 


the form of each item is D7! t27" ln = D-*, where li, +l dcl = k. Sup- 
pose / = max(/i, h, ..., lm}, then the range of k is from n to nl. Define the number 
of items where Nz is D~*, then 


m n nl 
(X 2) BY Nar, 
i=1 k=n 


Note that N; can be regarded as the number of codeword sequences with a total 
length of k just assembled by n codewords in C, i.e., 


Ng = (c1, c2, . . . , Cn) leo: 6| = k, ci € C}. 


The codeword is still in Z% , and because f : X*—> Z% is an injection, so Ny < D*. 
then we have 


m n nl nl 
(o7) =f MD* <9 DD* =n -n41 <n. 
i=l 


k=n k=n 


If x > 1, and when n Is Sufficiently Large, x” > nl. But the above formula holds for 
all arbitrary n. That is $7 , D^ < 1. 

On the contrary, assuming that Kraft inequality exists, that is, there is a given 
length /;(1 <i < m) satisfying formula (3.66), now we need to construct a real- 
time code with these lengths, and /;(1 < i < m) may not be completely different. 
Definition n; is the number of codewords with length j, if / = max(li, l2, ..., In}, 
then 


(3.66) equivalent to 


I 
j=l 


Multiply both sides by D’, then X nj D'I < D!. There is 
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Because n; < D, we can choose these n; codes arbitrarily, and the remaining D — 
nı codes with length 1 can be used as the prefix of other codewords. Therefore, 
there are (D — nj) D options for codewords with length of 2. That is nz < D? — 
nı D. Similarly, (D — n1) D — n» codewords can be used as prefixes of subsequent 
codewords. Therefore, there are at most ((D — n1) D — n5) D options for codewords 
with length of 3. That is n3 < D? — ni D? — mD. ..., in this way, we can always 
construct a real-time code with length {/;, l2, ..., Jn}. The Lemma holds! 


Let us give an example that is not the only one that can be decoded. 
Example 3.7 Let X = (1,2, 3, 44, Zp = Fo, the coding scheme is 


Source letter| Codeword 
i 0 
2 1=f(2) 
3 00=f(3) 
4 11=f(4) 


Because the encoder inputs and the decoder receives continuous codeword sym- 
bols, if the character received by the decoder is 001101, there may be two decoding 
results, 112212 and 3412. This shows that f* is not an injection, that is, the code 
written by f is not uniquely decodable. 


By Lemma 3.16, real-time codes or, more generally, uniquely decodable codes 
must satisfy Kraft inequality. However, the variable length code compiled according 
to kraft inequality is not the optimal code, because from the perspective of random 
coding, an optimal code not only requires the accuracy of decoding, but also ensures 
the efficiency, that is, the average random code length requires the shortest. We 
summarize the strict mathematical definition of the optimal code as. 


Definition 3.16 Let X = (xj, x2,..., Xm} is an information space, a real-time code 
C = {f (x1), f(x2),..., f(Xm)} is called an optimal code if its average random code 
length 


m 


L- n (3.67) 
i-l 
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is the smallest, where p; = p(x;) is the occurrence probability of x; and /; is the code 
length of x;. 


For a source state set X, when its statistical characteristics are determined, that is, 
after X becomes an information space, the probability distribution {p(x)|x € X) is 
given. Therefore, to find the optimal compression coding scheme for an information 
space X is to find the optimal solution (/1, l2, ..., lm} of (3.67) under the condition of 
kraft inequality. Usually, we use the Lagrange multiplier method to find the optimal 


solution. Let 
J= pili + (X >) ; 
i=l i=l 


Find the partial derivative of J; 


d XD 'logD 
— = pi -— "lo P 
al; P g 
Thus 
-h Pi 
à log D 
By Kraft inequality, that is 
Xo D<] 
i-l 
We get 
m 1 m 
1-25» D~“ = i 
2; Alog D 2: 5 ~ log D 


Thus, the optimal code length /; is 


li > —logp pi, pi => D^. (3.68) 


The corresponding optimal average code length L is 


L — 5 pili = — 3. pilogp pi = Hp(X). (3.69) 


i=1 i=1 


That is, L is the D-ary information entropy Hp(X) of X. from this, we get the main 
results of this section. 


Theorem 3.11 The average length L of any D-ary real-time code in an information 
space X shall satisfies 
L> Hp(X). 
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The equal sign holds if and only if p; = D^. 


Next, we will give another proof of Theorem 3.11. Therefore, we consider that there 
are two random variables & and 7 on a source state set X, and their probability 
distributions are 


D(x) = P{E =x}, q(x) = P{n =x}, Vx e X. 
The relative entropy of random variables is defined as 


p(x) 
q(x) 


D(pllq) = 3 pœ) log (3.70) 


xeX 


Lemma 3.17 The relative entropy D(p||q) of two random variables on X satisfies 


D(p\lq) = 0, and D(pl|g) 20 € p(x) =qx),Vx eX. 


Proof If the real number x > 0 is expanded by the power series of e*, it can be 
obtained 


x—1 1 2 
e edges core Te. 


Thus e*™! > x, there is logx < x — 1, by (3.70), then 


q(x) 
p(x) 


—D(pllq) = >> p(x) log 


xeX 


q(x) _ 
< 2 PON -1)=0. 


Thus, there is D(p||q) > 0, D(p||q) = 0’s conclusion is obvious. 


Proof (Another proof of theorem 3.11) Investigate L — Hp(X), 


m m 


1 
L- Hp(X) = È pili - ) pi logy z; 
= id (3.71) 


m 


—— X pi logp D 4 X pi logp pi. 
i=l 


i=l 


Define 


By Kraft inequality, we have 
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m 


c <1, and y» =]; 
i=l 


Therefore, {r;, 1 < i < m} is a probability distribution on X, by (3.71), 


m m 


1 
L — Hp(X) = A logy cri + ) pi logp pi = » C + logy 3l 


i=l i=] 


By Lemma 3.17 and c < 1, we have 
L — Hp(X) > 0, and L = Hp(X) if and only if c = 1 and r; = p;, 


that is 
pi = D™", orl; = logp — 


Pi 
We complete the proof of theorem 3.11. 


By Theorem 3.11, coding according to probability, then the code length of D-ary 
optimal code is 


1 
li = logp—, 1<i<m. 
Pi 


But in general, logp z is not an integer, we use [a] to represent the smallest integer 
not less than the real number a. Take 


1 
= fios H ,l<i<m. (3.72) 
p 


Then 


m 


Y "ix logy 5. Mat 


Then the code length defined by formula (3.72) is {l,, l2, . . . , Ln} and satisfies Kraft 
inequality. From Lemma 3.16, we can define the corresponding real-time code. 


Definition 3.17 Let X = (xj, x2,..., Xm} be an information space, p; = p(x;), 


Ife) = li = LT -| ,l<i<m. 
Pi 


Then the real-time code corresponding to (/;, l2, . . . , lm} is called Shannon code. 


Corollary 3.6 The code length l( f (x;)) ofa Shannon code C = { f (x)| 1 < i < m} 
satisfies 
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1 1 1 
l; = | log l ,logp —— xl; < logp —— +1 (3.73) 
| P p(xi) P p(xi) P p(xi) 
and 


Hp(X) < L < Hp(X) * 1. 


Where L is the average code length of C. 


Proof According to the definition of fa], a < [a] < a + 1, thus 


1 
< li «logp —— + 1 


1 
logn- 
nee pou) ~ p(x) 


So both sides multiply by p(x;) and sum 1 < i < m, then there is 


m 1 m m 1 
rolg; < ret « E (108, p + i) . 


i=] pou) — = 


That is 


The Corollary holds. 


3.7 Several Examples of Compression Coding 


3.7.1 Morse Codes 


In variable length codes, in order to make the average code length as close to the 
source entropy as possible, the code length should match the occurrence probability of 
the corresponding coded characters as much as possible. The principle of probabilistic 
coding is that the characters with high occurrence probability are configured with 
short codewords, and the characters with low occurrence probability are configured 
with long codewords, So as to make the average code length as close to the source 
entropy as possible. This idea has existed long before Shannon theory. For example, 
Morse code invented in 1838 uses three symbols of dot, dash and space to encode 26 
letters in English. It is expressed in binary, one dot is 10, a total of 2 bits, one dash is 
1110, a total of 4 bits and the space is 000. There are three bits in total. For example, 
the commonly used English letter E is represented by a dot, while the infrequently 
used letter Q is represented by two dashes, one dot and one dash, which can make 
the average length of the codeword of the English text shorter. However, Morse code 
does not completely match the occurrence probability, so it is not the optimal code, 
and it is basically not used now. The following table is the coding table of Morse 
code (Fig. 3.1) 
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Fig. 3.1 The coding table of 
Morse code 


A 
B 
c 
D 
E 
F 
G 
H 
| 
J 
K 
L 
M 
N 
(0) 
P 
Q 
R 
S 
T 
U 
V 
w 
X 
Y 
Z 


It is worth noting that Morse code appeared as a kind of password in the early 
stage, which is widely used in the transmission and storage of sensitive politics (such 
as military intelligence). The early cryptosystem compilers were also manufactured 
based on the principle of Morse code, which quickly mechanized the compilation 
and translation of passwords. In this sense, Morse code has played an important role 
in promoting the development of cryptography. 
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3.7.2 Huffman Codes 


Shannon, Fano and Huffman have all studied the coding methods of variable length 
codes, among which Huffman codes have the highest coding efficiency. We focus on 
the coding methods of Huffman binary and ternary codes. 

Let X = (x1, x2, ..., Xm} be the source letter set of m symbols, arrange the m 
symbols in the order of occurrence probability, take the two letters with the lowest 
probability to prepare the numbers “0” and “1,” respectively, then add their proba- 
bilities as a new letter and rearrange them in the order of probability with the source 
letters without binary numbers. Then take the two letters with the lowest probability 
to prepare the numbers “0” and “1,” respectively, add the probabilities of the two 
letters as the probability of a new letter, and re queue; continue the above process 
until the probability of the remaining letters is added to 1. At this time, all source 
letters correspond to a string of “0” and “1,” and we get a variable length code, which 
is called Huffman code. Taking X — (1, 2, 3, 4, 5) as the information space as an 
example, the corresponding probability distribution is 


sf Lo X 4 5 
5 0.25 0.25 0.2 0.15 0.15) ° 


Binary information entropy H2(X) and ternary information entropy H3(X) are 


H(X) = —0.25 log, 0.25 — 0.25 log, 0.25 — 0.2 log, 0.2 
— 0.15 log, 0.15 — 0.15 log, 0.15 
= 2.28 bits, 
H3(X) = —0.25 log; 0.25 — 0.25 log, 0.25 — 0.2 log; 0.2 
— 0.15 log; 0.15 — 0.15 log; 0.15 
= 1.44 bits, 


respectively. The binary Huffman coding diagram of X is (Fig. 3.2). 

The ternary Huffman coding diagram of X is (Fig. 3.3). 

In summary, Huffman code has the following characteristics. Assuming that the 
occurrence probability of the i-th source letter is p; and the corresponding code 
length is /;, then 


Fig. 3.2 The binary Source letter Probability Code word Code word length 
Huffman coding 

1 0.25 — | 00 2 

2 0.25 —— 01 2 

3 0.20 — 10 2 

4 0.15 110 3 
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Fig. 3.3 The ternary Source letter Probability Code word Code word length 
Huffman coding 

1 0.25 — à 0 1 

2 0.25 —— 1 1 

3 0.20 — 2 2 

c 0.15 12 2 

6 0.15 — 22 2 


(1) If p; > pj,thenl; < lj, thatis, the source letter with low probability has a longer 
codeword; 

(2) The longest two codewords have the same code length; 

(3) The codeword letters of the two longest codewords are only different from the 
last letter, and the front ones are the same; 

(4) In real-time codes, the average code length of Huffman code is the smallest. In 
this sense, Huffman code is the optimal code. 


Huffman code has been applied in practice, which is mainly used in the compression 
standard of fax image. However, in the actual data compression, the statistical char- 
acteristics of some sources change before and after. In order to make the statistical 
characteristics based on the coding adapt to the changes of the actual statistical char- 
acteristics of the source, an adaptive coding technology has been developed. In each 
step of coding, the coding of a new message is based on the statistical characteristics 
of previous messages. For example, R. G. Gallager first proposed the step-by-step 
updating technology of Huffman code in 1978, and D.E. Knuth made this technol- 
ogy a practical algorithm in 1985. Adaptive Huffman coding technology requires 
complex data structure and continuous updating of codeword set according to the 
statistical characteristics of source, We would not go into details here. 


3.7.3 Shannon-Fano Codes 


Shannon-Fano code is an arithmetic code. Let X be an information space. It can be 
inferred from Corollary 3.6 in the previous section that the code length of Shannon 
code on X is 


l(x) = L =| Vx € X. 
p(x) 
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Here, we introduce a constructive coding method using cumulative distribution func- 
tion to allocate codewords, commonly known as Shannon-Fano coding method. 
Without losing generality, let each letter x in X, there is p(x) > 0, and define the 
cumulative distribution function F (x) and the modified distribution function F (x) 
as 


- 1 
FG) = Di) pa, FG) — 9 pa + 5p), (3.74) 
where X = {1, 2,..., m} isa given information space. Without losing generality, let 


pi < p2) s+ < pon). 
As can be seen from the definition, if x € X, then p(x) = F(x) — F(x — 1), 
specially, if x, y € X, then we have 


F(x) # FO). 


So when we know F (x), we can find the corresponding x. The basic idea of Shannon— 
Fano arithmetic code is to use F(x) to encode x. Because F (x) 1s a real number, its 
binary decimal represents the first / (x) bits, denote as {F (x)]);(, there is 


FQ) = (Gli) « 2779. (3.75) 
Take /(x) = | log zu + 1, then we have 


1 1 p) 
< 


J~ - = F(x) — F(x — 1), (3.76) 
21) 7. olog 5] 2 


Now let the binary decimal of F (x) be expressed as 
F(x) = 0.a,a2 +--+ ayy dyxyjgi-++ , V aj € Fo. 
Then Shannon-Fano code is 
f (x) — aa: -a thatis x => aja- -ar e FO. (3.77) 


Lemma 3.18 The binary Shannon Fano code is a real-time code, and its average 
length L is at most two bits different from the theoretical optimal value H(X). 


Proof By (3.76), 
"e Spo ejr 


Let the binary decimal of F (x) be expressed as 


F(x) = 0.a1405:-- d: , V aj € Fo. 


3.8 Channel Coding Theorem 135 


We use [A, B] to represent a closed interval on the real axis, so 


=: 


2 1 
F(x) € [0.a1a2 +++ ayy), 0.4) a2 +++ a + 516) 


If y € X, x Æ y, and f (x) is the prefix of f(y), then we have 


F) € [G.a1a2 +++ aia), 0.2182 à) + zl. 


1 
2lQ) 
But 

- Situs 1 x 1 1 
(y) — F(x) = 5 PO) = 5P0) > 3 


This is contrary to the fact that F(x) and F( y) are in the same interval. Therefore, we 
have f as real-time code, that is, Shannon-Fano code is real-time code. Considering 
its average code length L, 


1 
L=} po) = Y pco (foe xs|* 1) < 2,20 (vs - TE 2) = H(X) +2 


xeX xeXx 


We complete the proof of the Lemma. 


Let n > 1, X" is the power space of the information space, x = x1--- x, € X" 
is called a message column of length n. In order to improve the coding efficiency, 
it is often necessary to compress the power space X", which is called arithmetic 
coding. Shannon-Fano code can also be used as arithmetic coding. Its basic method 
is to find a fast algorithm for calculating joint probability distribution p(x, x2 - - - Xn) 
and cumulative distribution function F (x), and then use Shannon-Fano method to 
encode x = x; -+ - Xn. We will not introduce the specific details here. 


3.8 Channel Coding Theorem 


Let X be the input alphabet and Y the output alphabet, and let and 7 be two random 
variables with values on X and Y. The probability functions p(x) and p(y) of X and 
Y and the conditional probability function p(y|x) are 


p(x) = P(& =x}, p(y) = Pin = y}, pOlx) = P{n = y| = x}respectively. 
From the full probability formula, 
P(y|x) = 0, VxeX,y cy. 


Y polx)=1, VxeX. (3.78) 


yeY 
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If X and Y are finite sets, the conditional probability matrix T = (p(y|x))jxixqv| is 
called the transition probability matrix from X to Y, i.e., 


pix) px) ... pOnlx) 
p = | 2012) pOolx) ... pOwlx2) (3.79) 


pOilxu) pGolxyu) -.- pOvlxu) 


where |X| = M, |Y| = N. By (3.78), each row of the transition probability matrix 
T is added to 1. 


Definition 3.18 (i) A discrete channel is composed of a finite information space X 
as the input alphabet, a finite information space Y as the output alphabet, and a 
transition probability matrix T from X to Y, denote that this discrete channel is 
{X, T, Y}. If X = Y = F; is q -element finite field, then (X, T, Y] is a discrete 
q-ary channel. In particular, if q = 2, then (X, T, Y} is called discrete binary 
channel. 

(i) If (X, T, Y} isadiscrete q-ary channel and T = J, is the q-order identity matrix, 
(X, I5, Y] is called a noise free channel. 

Gii) If (X, T, Y} is a discrete q-ary channel and T = T’ is a q-order symmetric 
matrix, (X, T, Y} is called a symmetric channel. 


In discrete channel (X, T, Y}, codeword spaces X" and Y" with length n are 
defined as 


X'—[x-ox--xJx;eXLY'—íyoy--yly E Y},n = 1. 


The probabilities of joint events x = x, ---x, and y = yı --- y, are defined as 


P(x) = Pæ Xn) =] ] pG), pO) 2 poi» =[] pod. 680 
isl 


i=1 


then X and Y become a memoryless source, X" and Y" are power spaces, respectively. 


Definition 3.19 Discrete channel (X, T, Y] is called a memoryless channel if for 
any positive integer n > 1, x = x1 -++ Xn € X”, y = y1 ++- y, € Y", we have 


P(y|x) = II pGilxi), (3.81) 


piyi) = P(x1y1), Vi = 1. 


From the joint event probability p(x; y:i) = p(x1y1) in equation (3.81), then there 

is 
p(x) 
P(xi) 


pGilxi) = pOl). (3.82) 
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The above formula shows that in a memoryless channel, the conditional probability 
P(ilx;) does not depend on y;. 

Definition 3.19 is the statistical characteristic of a memoryless channel. The fol- 
lowing lemma gives a mathematical characterization of a memoryless channel. 


Lemma 3.19 A discrete channel (X, T, Y) is a memoryless channel if and only 
if the product information space XY is a memoryless source, and a power space 
(XY)" = X"Y", 


Proof If XY is a memoryless source (see Definition 3.9), thn for any n > 1, and 
X = X1 +t- Xn E X”, y = yy, € Y", xy € X"Y, there is 


n 
p(xy) = pa xy) = poayi x») = | Dres» 


i=l 
Thus n 
pGopGlx) = pœ) | [ pila). 


i=l 
so we have 


pto = | | pila. 
i-l 


pGiyi) = pGaiy is given by the definition of memoryless source, so (X, T, Y) is 
a memoryless channel. Conversely, if (X, T, Y) is a memoryless channel, by (3.81), 
there are 


pay) = | pays 
i=l 


and p(x; yj) = p(x1y1), then for any a = aiaz: - -an € (XY)", where a; = x; yi, we 
have 


p(a) = pQa yi Yn) = PY) = | [ Pv =] [re 
i=1 i=l 


and p(a;) = p(a1), therefore, XY isa memoryless source, that is, a group of indepen- 
dent and identically distributed random vectors $ = (&1, &5, ..., En, ...) take value 
on XY, and (XY)" = X” Y” is called power space. The Lemma holds. 


The following lemma further characterizes the statistical characteristics of a mem- 
oryless channel. 


Lemma 3.20 /f(X, T, Y] isadiscrete memoryless channel, the conditional entropy 
H (Y" | X") and information I (X", Y") of information space X" and Y" satisfy V n > 
l, 
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H(Y"|X^) = nH(Y|X). 
I(X^, Y") 2 nI(X, Y). 


(3.83) 

Proof Because XY is a memoryless source, we have 

H(X"Y") = H((XY)') = nH(XY) =nA(X) + nH (YX). 
On the other hand, by the addition formula of entropy, there is 

H(X"Y") = A(X") + H(Y"|X") = nH(X) + A(Y"|X"). 
The combination of the above two formulas has 

H(Y"|X") = nH(Y|X). 

According to the definition of mutual information, 


I(X", Y") = H(Y") — H(Y"|X") 
nH(Y) —nH(Y|X) 
— n(H(Y) — H(Y|X)) 2 nI (X, Y). 


The Lemma holds. 


Let us define the channel capacity of a discrete channel, this concept plays an 
important role in channel coding. First, we note that the joint probability distribution 
p(xy) in the product space XY is uniquely determined by the probability distribution 
p(x) on X and the probability transformation matrix T, thatis p(xy) = p(x) p(y|x); 
therefore, the mutual information / (X, Y) of X and Y is also uniquely determined 
by p(x) and T. In fact, 


1% Y) = OY b) log P9) 


xeX ycY P(x )p (y) 
pOlx 
= l 
zz p(x) 2. pGlxo)log =| Xp pworowy 


Definition 3.20 The channel capacity B of a discrete memoryless channel (X, T, Y] 
is defined as 
B ee Y), (3.84) 
p(x 


where formula (3.84) is the maximum of all probability distributions p(x) on X. 


Lemma 3.21 The channel capacity B of a discrete memoryless channel (X, T, Y] 
is estimated as follows: 
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0 x B x minflog |X|, log|Y|}. 


Proof 'The amount of mutual information between the two information spaces is 
I(X, Y) > 0 (see Lemma 3.5), so there is B > 0. By Lemma 3.4, 


I(X, Y) = H(X) — H(X|Y) x A(X) x log|X| 


and 
I(X, Y) = H(Y) — H(Y|X) x H(Y) x logl|Y|, 


so we have 
0 x B x min(log|X|, log|Y|}. 


The calculation of information capacity is a problem of solving the conditional 
extremum of constrained convex function. We will not discuss it in detail here but 
calculate its channel capacity for two simple channels. 


Example 3.8 The channel capacity of noiseless channel (X, T, Y} is B = log |X|. 


Proof Let (X, T, Y) be a noise free channel, then |X| = |Y|, and the probability 
transfer matrix T is the identity matrix, so 


1x, Y) = Y Yo pay) log POX 


xeX yeY pG ) 
= Y^ pw) Y? poto tog POH., 
xeX yeY Ply ) 


Because p(y|x) = 0, if y Z x; p(y|x) = 1, if x = y. So there is 


Ix Yy-x* p(x) log = 


xEX 


= H(X) < log |X|. 


Thus 
B = max I (X, Y) = log |X|. 
p(x 


Example 3.9 The channel capacity B of binary symmetric channel {X, T, Y} is 
B —1-— plogp — (1 — p)log(1 — p) = 1 — H(p), 
where p « 1 H (p) is the binary entropy function. 


Proof In binary symmetric channel (X, T, Y}, X = Y = F = (0, 1}, T is a second- 
order symmetric matrix 
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Let a be the random variable in the input space F, and b be the random variable in the 
output space IF5, all of which obey the two-point distribution, and then the transfer 
matrix T of the symmetric binary channel can be represented by the following clearer 
schematic diagram: 


P{b —0|a = 0} = P{b = lla 1] = 1 — p. 


prse teen d ae 


Calculate mutual information / (X, Y), there is 


I(X, Y) = H(X) — H(X|Y), 


however, 
H(X|Y) = 35 D> py) log p(xly) 
xeF MIS 
= —plog p — (1 — p)log(1 — p) = H(p). 
Thus 


B = max(I (X, Y)) = max{H(X) — H(p)} 21— H(p). 


In order to state and prove the channel coding theorem, we introduce the concept 
of joint typical sequence. By the Definition 3.13 of Sect.5 this chapter, if X is a 
memoryless source, for any small ¢ > 0 and positive integer n > 1, in the power 
space X", we define the typical sequence W^ as 


1 
Wi? = (x = xi---x, € X"|| - — log p(x) — H(X)| < e}. 
n 


If (X, T, Y) is a memoryless channel, by Lemma 3.19, XY is a memoryless source, 
in the power space (X Y)" = X"Y", we define the joint canonical sequence W^ as 
(Fig. 3.4) 


1 1 
w™ = lo e X"Y"|— — log p(x) — H(X) «&,|— —logp(y) — H(Y)| < e, 
n n 


| log pxy) — HOOP) <e). (3.85) 


Lemma 3.22 (Progressive bisection) In memoryless channel (X, T, Y), the joint 
typical sequence W® satisfies the following asymptotic bisection properties: 


(i) lim P(xy e WY} =1; 
n— oo 
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Fig. 3.4 The transfer matrix 


(ii) a = £) Qn(H(XY)—e) < [w] < QnA (XY)+2) . 
(iii) If x € X", y e Y",and p(xy) = p(x)p(y), then 


ad uM £) 2-nü QCGY)e3e) < P{xy € wt) < 2-nÜQCY)-36) 


Proof By Lemma 3.13, we have 


1 

—-— log p(X") — H(X), Convergence according to probability when n — oo; 
n 
1 : - 

—— log p(Y") > H(Y), Convergence according to probability when n — oo; 
n 


1 
—-— log p(X"Y") — H(XY), Convergence according to probability when n — oo. 
n 


So when e is given, as long as n is sufficiently large, there is 


1 1 
Pi = P fi- Tog peo - HC >e] < 3° 

n 

1 1 
Py = P fi- “tog p00) - HOD) z « 3° 

n 


1 1 
P, = P { — 4 98 p(xy) — H(XY)| > e] < 3° 


where x € X", y € Y". Thus, it can be obtained 
P {xy ¢ WP}, < Pi + P Ps <e. 


Thus 
P {xy € wi) >l-e, 


in other words, 
lim P(xy e WS} = 1. 
n— oo 


Property (i) holds. To prove (ii), let x € X", y € Y", and xy € W,”, then 
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1 
H(XY) — € < —- log p(xy) < H(XY) +e. 
n 


Equivalently, 


27 n LY ee) n(H(XY)—s) 


< p(xy) <27 


By total probability formula, 


aS 2. pxy) 2 D pay) > WP aca ee 


xyeXnyn xyew” 


So there is 
iw < Qn II (XY)+e) 
eene ! 


On the other hand, when n is sufficiently large, 


l-e< P(xy e W?) 2 Y p(xy) 
xyew 


< we” Qn H(XY)—e)_ 


So there is 
(1— &) QnA (XY)—e) < IW] < QnOLXY +8) 


property (ii) holds. Now let's prove property (iii). If p(xy) = p(x) p(y), then 


P(xy € W”}= M p(x)p) 


xyew,” 


(n) (H(X)—e) (H(Y)—e) 
< Iw? |2 n E 2 n € 
< 20(H(XY)+s-H(X)—-H(Y)+28) 


= 2-nü (X,Y)—36) 


Similarity can prove its lower bound, so we have 
(1 — &) Qn (X, Y)+3e) < P{xy € Ww) < 271 QCY)-3e) 
= ays : 


We have completed the proof of Lemma. 


The following lemma has important applications in proving the channel coding 
theorem. In fact, the conclusion of lemma is valid in general probability space. 


Lemma 3.23 In memoryless channel (X, T, Y), if codeword y € Y" is uniquely 
determined by x € X", x' € X", x' and x are independent, y and x! are also inde- 
pendent. 
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Proof If y is uniquely determined by x, then p(x) = p(y) = p(xy), or p(y|x) = 1. 
Therefore, the probability of joint event yxx' is 


p(xx) = p(xx') = p(x) pr’) = p(y) p(x’). 


on the other hand, 
P(yxx') = pox’). 


Thus 
px) = pop’). 


The Lemma holds. 


In order to define the error probability of channel transmission, we first introduce 
the workflow of channel coding. After source compression coding, a source message 
input set is generated, 


W = {1,2,..., M}, M > 1 is positive integers. 


Injection f : W — X” is called coding function, f encodes each input message 
w € W as f (w) € X". Codeword x = f (w) € X" receives codeword y € Y" after 


transmission through channel (X, T, Y), we write x ELS y, or y = T(x). Mapping 
g : Y" — W iscalled decoding function. Therefore, the so-called channel coding is 
a pair of mapping (f, g). Obviously, 


C = f(W) ={f(w)|w € W} c x^ 


is a code with length n in codeword space X”, number of codewords is |C| = |W| = 
M. C is the code of f. The code rate Rc is 


1 1 
Rc = — log|C| = - log M. 
n n 


For each input message w € W, if g(T(f(w))) Z w, it is said that the channel 
transmission is wrong, the transmission error probability àw is 


Aw = Pig(T(f(w)) Fw}, we W. (3.86) 


The transmission error probability of codeword x = f(w) € C is recorded as P; (x), 
obviously, P.(x) = Ay, that is, P.(x) is the conditional probability 


Po(x) = Plg(T(x)) 5 wilx = f(w)} 


(3.87) 
= Pig(T Cf (w))) # w} = Aw. 


We define the transmission error probability of code C = f(W) C X" as P,(C), 
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1 [s 
PC) = qr » BR a yu (3.88) 
w=1 


xeC 


As before, a code C with length n and number of codewords M is recorded as 
C — (n, M). 


Theorem 3.12 (Shannon's channel coding theorem, 1948) Let (X, T, Y) bea mem- 
oryless channel and B be the channel capacity, then 


(i) When R « B, there is a column of codes C, — (n, 2l). its transmission error 
probability P.(C,,) satisfies 


lim P.(C,) = 0; (3.89) 
n— oo 


(ii) Conversely, if the transmission error probability of code C, = (n, 2") satisfies 
Eq. (3.89), there is an absolute normal number No, and we have the code rate 
Rc, of C, satisfies 
Rc, € B, when n = No. 


If C, = (n, 2 55, by Lemma 2.27 of Chap. 2, 


1 
R- < Rc, <R. (3.90) 
n 


so (i) of Theorem 3.12 indicates that the code rate is sufficiently close to the channel 
capacity B, the “good code” with sufficiently small transmission error probability 
exists. (ii) indicates that the bit rate of the so-called good code with sufficiently small 
transmission error probability does not exceed the channel capacity. Shannon’s proof 
Theorem 3.12 uses random code technology; this idea of using random method to 
prove deterministic results is widely used in information theory. At present, it has 
more and more applications in other fields. 


Proof (Proof of theorem 3.12) Firstly, the probability function p(x;) is arbitrarily 
selected on the input alphabet X, and the joint probability in power space X” is 
defined as 


p(x) = | [e@. X = X1 Xn E X", (3.91) 
i=l 


In this way, we get a memoryless source X and power space X", which consti- 
tute the codeword space of channel coding. Then M = 2!"%! codewords are ran- 
domly selected in X” to obtain a random code C, = (n, 2!"1) In order to illus- 
trate the randomness of codeword selection, we borrow the source message set 
W = (1,2,..., M}, where M = 2l, For every message w, 1 < w < M, the ran- 
domly generated codeword is marked as X  (w). So we get a random code 
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C= [X tb, X® (2), ..., XO) c X”. 


The generation probability P(C,] of C, is 


M M n 
P{C,) =] ] Pix?» = [TT rw, 


w=1 w=1 i=1 


where X? (w) = xı (w)xa (w) - - - xn (w) € X". 

We take A, = {Cn} as the set of all random codes C,,, which is called the random 
code set. The average transmission error probability on random code set A, is defined 
as 

PAn) =  PIGIHBU. (3.92) 
Cn€An 


If you want to prove that for any £ > 0, When n is sufficiently large, P.(An) < e, 
then there is at least one code C, € A, such that P,(C,) < e, which proves the (i). 
Therefore, we prove it in two steps. 

(1) Principles of constructing random codes and encoding and decoding 

We select each message in the source message set W = (1,2,..., M} with equal 
probability, that is w € W, the selection probability of w is 


1 
pw) — 4; — 2 P, w=1,2,...,M. 


In this way, W becomes an equal probability information space. For each input 
message w, it is randomly coded as X? (w) € X", where 


X9? (w) = x1 (w)x2(w) +++ x, (w) € X". 


Codeword X? (w) is transmitted through memoryless channel {X, T, Y) with con- 
ditional probability 


p(y X (wy) = | [»p6rlxico» 


i=l 


received codeword y = yiy» --- y, € Y". The decoding principle of y is: If X? (w) 
is the only input codeword so that X™ (w)y is joint typical, that is X? (w)y € Wt, 
then decode g(y) = w; if there is no such codeword X™ (w), or there are two or 
more codewords X? (w) and y are joint typical, y cannot be decoded correctly. 

(2) Estimating the average error probability of random code set A, 

By (3.92) and (3.88), 


146 3 Shannon Theory 


P(An) = PIPAO) 


C,€A, 


> PC Yo RO) 


C,€A, xeC, 


1 M 
x 22» 2; PIC) 


w-l C,€A, 
M 
1 
TW i5 
wzl 


where A, is given by Eq. (3.86). Because w is input with equal probability, in 
other words, w is encoded with equal probability. Therefore, the transmission error 
probability A, of w does not depend on w, that is 


(3.93) 


Ay = Ang =- = ÀM. 
By (3.93), we have P.(An) = i,. To estimate A,, we define 
E; = {y e ¥"|X"@y e W, i =1,2,..., M, (3.94) 
If Ef = Y”\ Æ is the remainder of £,, because of the decoding principle, 


M 
ài = P(EQU E;U--- U Ey) < P(Ej) +} | P(E;). (3.95) 
i-2 
By property (i) of Lemma 3.22, 
lim P(xy ¢ Wf?) « 0. 
n— oo 


So there is 
lim P(X? (1)y e W®} 2 0. 
n—oo 


Therefore, when n is sufficiently large, 

P(Ej) < €. 
Obviously, codeword X ^? (1) and other codewords X (i), (i = 2,..., M) are inde- 
pendent of each other (see 3.91). By Lemma 3.23, y = T(X(1)) and X? (i) GF 


1) also are independent of each other. Then by the property (iii) of Lemma 3.22, 


P[E PIX (iy e WI») < 9-500009 (i £ 1). 
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To sum up, 


M 
Pe(An) = < e + » 1270039 
i=2 
p 2In R]o -n( (C, Y)-36) 


«gp 2710 0GY)-R-36) 


If R < I(X, Y),then7 (X, Y) — R —3e > O(when € is sufficiently small), so when n 
islarge enough, we have P, (A45) «26. Due to the channel capacity B = max(I (X, Y)}, 
we can choose p(x) to make B = I (X, Y). So when R < B, we have P,(A,) < 2e, 
this completes the proof of (i). 

To prove (ii), let's look at a special case first. If the error probability of C — 
(n, 2!"81) is P,(C) = 0, then the bitrate of Cis Rc < B+ 1, so when n is sufficiently 
large, there is Rc < B. 

In fact, because P,(C) = 0, decoding function g : Y" — W only determines W, 
there is H (W|Y") — 0. Because W is equal probability information space, so 


H(W) = log |W| = [nR]. 

Using the decomposition of mutual information, there are 

I(W, Y") = H(W) — H(W|Y") = H(W) = [nR]. (3.96) 
on the other hand, W — X" — Y" forms a Markov chain, by data inequality (see 
Theorem 3.8) 

I(W, Y") < I(X", Y"). 
By Lemma 3.20, 
I(W, Y") < I(X", Y”) Z nI(X, Y) x nB. 


By (3.96), there is [nR] € nB. Because nR — 1 < [nR] < nR, sonR «nB +1, 
thatis R < B+ i by (3.90), we have 


1 
Ro <R<B+-, 
n 


thus 
Rc < B, when n is sufficiently large. 


The above formula shows that when the transmission error probability is 0, as long as 
n is sufficiently large, there is Rc < B. Secondly, if the transmission error is allowed, 
that is, the error probability of C, is P,(C,) < £, where C, = (n, 2"). Then when 
n is sufficiently large, we still have Rc, < B. 
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In order to prove the above conclusion, we note the error probability of random 
code C, is 
P,(C,) = àw, (3.97) 


where w € W is any given message. When w is given, we define a random variable 
&y with a value on (0, 1} as 


Ec 1, ifg(T(f(Qw))zuw; 

” (0o ifg(T(f(w))) = w. 
Let E = (Po, £j) be a binary information space, by (3.97), then we have 
P,(C,) = P[&, = 1}. 


By Theorem 3.3, 
H(EW|Y") = H(W|Y") + H(EWY") 


(3.98) 
= H(E|Y") + H(W|EY"). 


Note that E is uniquely determined by Y" and W, so H(E|WY") = 0, at the same 
time, E is a binary information space, H (E) < log2 = 1, there is 


H(E|Y") < H(E) x 1. 
On the other hand, the random variable &,, is only related to w € W, so 
H(W|EY") = P,(C,)log(W| — 1) < nRP,(C,). 


By (3.98), we have 
H(W|Y') x 1 - nRP&C,). 


Because f (W) — X"(W) is a function of W, we have the following Fano inequality 
A(f(W)|Y") < H(W|Y") < 1+nRP.(C,). 


Finally, 
= H(W) = H(W|Y^) + I(W, Y") 


< H(W|Y^) + ICf(W), Y") 
< 14+nRP.(Cy) + 1(X", Y") 
<1+4+nRP.(Cy) +nB, 


because of nR — 1 < [nR], then we have 


nR <2+nRP.(C,) +nB. 
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Thus 3 
Ro, <R<B+-+e, 
n 


When n is sufficiently large, we obtain Rc, < B, which completes the proof of the 
theorem. 


It can be seen from Example 3.9 that the channel capacity B = 1 — H(p) ofa 
binary symmetric channel. Therefore, Theorem 3.12 extends Theorem 2.10 in the 
previous chapter to a more general memoryless channel; at the same time, it is also 
proved that the code rate of a good code does not exceed the capacity of the channel. 


Exercise 3 


1. The joint probability functions of the two information spaces X and Y are as 
follows: 


Solve H(X), H(Y), H(XY), H(X|Y), H(Y|X), and I (X, Y). 
2. Let Xj, Xo, X3. be three information spaces on F2, Known I(X;, 
X5) = 0, I (X1, X2, X3) = 1, prove: 


H(X3) = 1, and H(XıX2X3) L2. 


ww 


. Give an example to illustrate 7 (X, Y|Z) > I(X, Y). 

4. Can I (X, Y|Z) = 0 be derived from 7 (X, Y) = 0? In turn, can (X, Y|Z) 2 0 
deduce 7 (X, Y) = 0? Please prove or give examples. 

5. Let X, Y, Z be three information spaces, prove: 


© H(XY|Z) > H(X|Z); 
(ii) I(XY, Z) > I(X, Z); 
(iii) H(XYZ) — H(XY) < H(XZ) — H(X); 
Gv) I(X, ZIY) = I (Z, Y|Z) — I (Z, Y) + I (X, Z). 


It also explains under what conditions the equality sign holds. 

6. Can I (X, Y) = 0 deduce 7 (X, Z) = I (X, Z|Y)? 

7. Let the information space be X = (0, 1, 2, ...} and the value probability p(n) 
of random variable £ be 


p(n) = P{Ẹ =n},n=0,1,.... 
Given the mathematical expectation EE = A > O of£, find the maximum proba- 


bility distribution (p(n)|n = 0, 1, ...} of H (X) and the corresponding maximum 
information entropy. 
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8. Let the information space be X = {0,1,2,...}, and take an example of the 
random variable £ taken from X, so that H(X) = oo. 
9. Let X, = (X, £), X2 = (X, n) be two information spaces and é be a function of 
n, prove H(X4) x H(X»), and explain this result. 
10. Let X, = (X, £), X2 = (X, n) be two information spaces and n = f(&), prove 


(i) H(X1) > H(X»), give the conditions under which the equal sign holds. 
i) H(X,|X2) > H(X»5|X|), give the conditions under which the equal sign 
holds. 
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Chapter 4 A) 
Cryptosystem and Authentication System = 3" 


In 1949, Shannon published a famous paper entitled “communication theory of secure 
systems” in the technical bulletin of Bell laboratory. Based on the mathematical 
theory of information established by him in 1948 (see Chap. 3), this paper makes a 
comprehensive discussion on the problem of secure communication and establishes 
the mathematical theory of secure communication system. It has a great impact 
on the later development of cryptography. It is generally believed that Shannon 
transformed cryptography from art (creative ways and methods) to science, so he is 
also known as the father of modern cryptography. The main purpose of this chapter 
is to introduce Shannon’s important ideas and results in cryptography theory, which 
is the cornerstone of the whole modern cryptography. 


4.1 Definition and Statistical Characteristics 
of Cryptosystem 


Let X = (a1, a2, ..., aq} be the plaintext alphabet and a source. (£;];?, is a set of 
random variables valued on X, for any given positive integer n > 1, we define the 
plaintext space P as the product information space X, X». - - Xp, that is 


P = X,X2---Xy, where X; 2(X,£), 1 xi € n. 
If m = m,4m5--- m, € P(m; € Xj), m is called a plaintext information column of 


alphabet length n, or a plaintext string of length n, the joint probability p(m) is 
defined as 


pn) = p(mmy+++m,) = P(& = mi & = my... & =m}. (4D) 
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Let Z = (bi, b», ..., bs} be the key alphabet, which is also a memoryless source 
(see Definition 3.9 of Chap. 3), let {7;}?2, be a group of random variables with 
independent values on Z and equal probability distribution, then for any b € Z, 


E EE E r Vi zl. (4.2) 
IZI 5 


We define the power space Z” as a key space, denoted by K, that is 
K = Z" ={k=kk---k,|ki € Z,1 <i <r}. 


Each k = k,ky---k, € K is called a key of length r, and the joint probability p(k) 
is 
1 


— = —. 4.3 
IZ | KI T 


p(k) = pala: k)2[[»*0» 
i-l 


This shows that the r-dimensional random vector 7 = (71, n2, ..., nr) taking value 
on the key space K is also equally almost distributed on K . Unless otherwise spec- 
ified, we generally stipulate that the plaintext space P and the key space K are 
independent information spaces, that is 


p(mk) = p(m)p(k), V m e P,keK. (4.4) 
For every k € K, k defines or controls an encryption transform Eg, denote by 
E = {Elk € K}. 


E is called encryption algorithm. When k € K is given, the encryption transformation 
E, acts on the plaintext m € P to produce a cryptosystemtext E, (m), each encryption 
transformation E, is an injection, and its left inverse mapping is recorded as Dx, 
which is called decryption transformation. Taking 1p as the identity transformation 
of plaintext space, that is 1p (m) = m, V m € P, then we have 


D, Ey = lp, or Dy (Ex (m)) —m, Vm € P. (4.5) 
Define cryptosystemtext space C as 
C = {Ex(m)|m e P,k e K} C XiXo--- Xp. (4.6) 


That is, cryptosystemtext space C and plaintext space P have the same alphabet and 
the same letter length. 

For each cryptosystemtext c € C, c = E,(m), then c is uniquely determined by 
plaintext m and key k, so we can define the occurrence probability p(c) of cryptosys- 
temtext c as 
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P(c) = p(Ex(m)) = p(km) = p(k) pm). (4.7) 


Obviously, 


Y= po) = 33M pkm) = 31 pt Y) pm) = 1, 


ceC keK meP keK meP 


Therefore, the cryptosystemtext space C defined by formula (4.7) is also an infor- 
mation space. 
When k € K is given, we let 


A, = {Ex(m)|m € P) c C. (4.8) 


Then the encryption transformation £; is the full mapping of P > Az, so E, isa 1-1 
correspondence of P — Ax, and its inverse mapping is the decryption transformation 
Dy, that is 

D, Ey = lp, E; Dy = la, k € K. 


We denote D as all decryption transformations, that is 
D = (Di|k € K}. (4.9) 


D is called decryption algorithm. 


Definition 4.1 Under the above provisions, R = (P, C, K, E, D} is called a cryp- 
tosystem, where P, C, K is the information space, K and P are statistically inde- 
pendent, E is the encryption algorithm and D is the decryption algorithm. 


The statistical characteristics of a cryptosystem are attributed to the following 
theorem. 


Theorem 4.1 Jf% = (P, C, K, E, D}, then 
(1) Vc € C, we have 


p(c) = >> pk) pD). (4.10) 
ay 
(2) c € C, m € P, then 
p(cim)- M pQo. (4.11) 
keK 
E;,(m)=c 


(3) c € C, m € P, then 


pm). kex p(k) 
pin|c) = rE. 
Yrer PW) P(Ds(C) 


(4.12) 
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where Ay is given by equation (4.8). 


Proof By (4.7), if c € C, then 


pc)- J. pkm= M, ppn) 


keK,meP keK,meP 
E;(m)=c E;(m)=c 
=i ph) M pm = Y p&) po). 
keK meP keK 
E,(m)—c cC A, 


(1) holds. (2) is trivial. Because when m € P is given, the occurrence probability 
p(c|m) of cryptosystemtext c has 


p(cim)— M; pie). 
keK, 
Ex(m)=c 


To prove (3), by (1.24), 


pm) p(c|m) 
3 mep P(m') pem) 


pGn|c) = 


the items in the denominator are 


35 pomop(elm) = 3, pm) SY) pte) 


m'eP m'eP keK 
Ex(m')=c 
=i ph) M pim) 
keK m'eP 
Ex(m')=c 
= Y pop). 
keK 
c€A, 


So in the end 
pm) 2 kek p(k) 
k(m)=t 


Deck POPOL) 


p(m|c) = 


Theorem 4.1 holds! 


By Theorem 4.1, the statistical characteristics of a cryptosystem can be summa- 
rized as follows: The probability distribution of cryptosystemtext space and the con- 
ditional probability distribution of plaintext about cryptosystemtext are completely 
determined by the probability distribution of plaintext space and key space. That is, 
anyone who knows the probability distribution of plaintext space and key space will 
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know the probability distribution of cryptosystemtext and the conditional probability 
distribution of plaintext about cryptosystemtext. 

It is assumed that the plaintext space and the key space are statistically indepen- 
dent, by (3.14) of Theorem 3.2 of Chap. 3, we have 


H(PK) = H(P) 4 H(K). (4.13) 


It has been previously specified that the key source alphabet Z is an equal proba- 
bility information space without memory, and the probability p(k) of the key space 
K = {k = kika -- - kp|ki e Z} is 


1 1 
——2—— (4.14) 


k) = ki) = = " 
p(k) [] ee ap ue 


Therefore, the key space is also an equal probability information space. 

From the definition of cryptosystem, when the plaintext space and key space are 
given, the cryptosystemtext space is completely determined. On the contrary, when 
the cryptosystemtext space and key space are known, the plaintext space is also 
known, combined with Lemma 3.3 in the previous chapter, we have 


Theorem 4.2 In a cryptosystem K = (P, C, K, E, D}, we have 
H(P|KC)=0, H(C|KP) —O0, H(K|PC) =0. 


Proof We only prove H (P|K C) = 0, similarly to H(C|K P) = Oand H(K|PC) = 
0. For given m € P, let 


Nn = {kele € C, k e Kand Ey (m) = c}. 


Thus N,, C KC, and 
p(m|kc) = 1, ifkc € Nm; 
p(m|kc) = 0, ifkc ¢ Nm. 


Because assuming kc € N,, is selected, then E; (m) = c, thus m = D;(c), m will be 
determined. Conversely, if kc ¢ Nm, when the kc-joint event occurs and m cannot 
occur, thus p(m|kc) = 0. By Lemma 3.3 from the previous chapter, H(P|KC) = 0, 
we complete the proof of Theorem 4.2. 


Corollary 4.1 In a cryptosystem R = (P, C, K, E, D}, we always have 


H(P) x H(C). 
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Proof lt is stipulated that P and K are statistically independent, so there are 


H(P) = H(P|K) = H(P|K) + H(C|PK) 
= HCIK) 
= H(C|K) + H(P|KC) 
= H(C|K) 
< H(C). 


The Corollary holds. 


The Corollary shows that the uncertainty of plaintext is less than that of cryp- 
tosystemtext in cryptosystem. 


4.2 Fully Confidential System 


Generally speaking, the mutual information 7 (P, C) between plaintext space and 
cryptosystemtext space (see Definition 3.8 in the previous chapter) reflects the infor- 
mation of plaintext space contained in cryptosystemtext space, so / (P, C) minimiza- 
tion is an important design goal of cryptosystem. If the cryptosystemtext does not 
provide any information about the plaintext, or the analyst cannot obtain any infor- 
mation about the plaintext by observing the cryptosystemtext, such a cryptosystem 
is called completely confidential. 


Definition 4.2 A cryptosystem 93 = (P, C, K, E, D}, if H(P|C) = H(P), or 
I(P, C) = 0, Kis called complete secrecy system, or unconditional secrecy system. 


Theorem 4.3 For any cryptosystem KR = (P,C, K, E, D}, we have 
I(P,C) > H(P) — H(K). (4.15) 
Proof By Theorem 4.2, we have H(P|K C) = 0, and 


H(P|C) = H(P|C) + H(K|PC) 
— H(PK|C) 
= H(K|C) + H(P|KC). 


So we have 
H(P|C) = H(K|C) x H(K). 


By definition, 


I(P,C) = H(P) — H(P|C) => H(P) — H(K). 
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So the Theorem holds. 


From the previous chapter, we know the amount of mutual information / (X, Y) > 
0, there is 


Corollary 4.2 [n a completely confidential cryptosystem R= (P, C, K, E, D}, 
there is always 
H(P) < H(K) = log, |K]. (4.16) 


Proof Defined by 91 = (P, C, K, E, D) as a completely confidential system, so 
I(P, C) = 0. From the above theorem, there are 


H(P) - H(K) < I(P,C) 20, 
Thus there is H(P) < H(K). By (4.14), K is equipotential distribution, so there is 
H(P) x log; |K]. 


It can be seen from the above that the larger the scale | K| of the key space, the 
better the confidentiality of the system! 


Definition 4.3 A cryptosystem 91 = (P, C, K, E, D} is called a “one secret at a 
time" system, if there is a unique key k € K for a given m € P andc € C, such that 
c = E,(m). 


As can be seen from the above definition, for given m € P,ifk Æ k', then E} (m) # 
Ey (m). In other words, we only use a unique key k to encrypt the same set of plaintext 
and cryptosystemtext. This is also the origin of the concept of “one secret at a time”. 
Thus, for any given plaintext m € P and cryptosystemtext c € C, there happens to 
be a unique key k € K such that E,(m) = c. Therefore, when k traverses the key 
space K, m traverses the plaintext space P, and each m appears only once. Thus, for 


c € C, we have 
po= »; } ppm) 


keK meP 
Ex(m)=c 
= 3) pO pm) (4.17) 
keK meP 
E;(m)=c 
1 1 
IK| 2, IK| 


That is to say, in a one-time cryptosystem, the cryptosystemtext space C is also an 
equal probability information space. 
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Theorem 4.4 The one-time password system is a completely confidential system. 


Proof When c, m given, by (4.11), 


1 
p(cm)- >> pO = ur 
keK 
Ex(m)=c 


By (4.12) and (4.17), 


p(m) 


E c D mep p(m') 


_ pm) 
D meP pm) 
= pm). 


p(mlc) = x 


Thus 
H(P|C) = — 35 | pine) log, plc) 


meP ceC 


=- X pmo)log; pan) 
meP ceC 


= — Y p(m) log, p(m) 


meP 


= H(P). 


Therefore, % = (P, C, K, E, D} is a completely confidential system. 


4.3 Ideal Security System 


In order to introduce Shannon’s concepts of unique solution distance and ideal cryp- 
tosystem, we first consider the scenario of secret only attack. In the scenario of secret 
only attack, when the cryptanalyzer intercepts cryptosystemtext c, he may decrypt c 
with all decryption keys D; to obtain 


m = Dy(c), ke K. 


Therefore, he records the keys corresponding to all meaningful messages m’, only one 
of the set of these keys is a correct key, while other incorrect keys are called pseudo 
keys. A large number of cryptosystemtexts are required as samples in secret only 
attacks. Therefore, we will consider the product space P” of plaintext and cryptosys- 
temtext and the joint events in C”, P” and C” as plaintext string and cryptosystemtext 
string. 
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Definition 4.4 For cryptosystemtext string y € C" with given length n, let 
K(y) = {k € K|3x € P" such that E,(x) = y}. (4.18) 


Then the number of pseudo keys is | K (y)| — 1. The mathematical expectation S, of 
the pseudo key is defined as 


Sn = D> POM KG) — D. (4.19) 


yec” 


Therefore, the mathematical expectation of pseudo key is the weighted average 
of the number of pseudo keys of each cryptosystemtext string. We first prove the 
following two theorems. 


Theorem 4.5 If% = (P, C, K, E, D} is a cryptosystem, there are 
H(K|C) = H(K)+ H(P) — H(C). (4.20) 


Proof From the addition formula of information entropy (see Theorem 3.2 in the 
previous chapter), 


H(KPC) = A(KP)+ H(C|K P) = H(KC)+ H(P|KC). 
By Theorem 4.2, we have 
H(C|K P) = H(P|KC) = 0. 


thus, 
H(K P) — H(KC). 


Again, from the addition formula and note that K and P are statistically independent, 
so 
H(KP) = A(P)+ H(K|P) = H(P)+ H(K) 


and 
A(P)+ H(K) = H(KC) = A(C)+ H(K|C). 
So we have 
H(K|C) = A(P)+ H(K) — H(C). 
The Theorem holds. 


Theorem 4.6 Let R = (P, C, K, E, DJ be a cryptosystem, and |C| = |P], let r 
be the redundancy of P, then the pseudo key mathematical expectation S, of a 
cryptosystemtext string with a given length of n satisfies 


162 4 Cryptosystem and Authentication System 


2H(K) 


| Pn 


Sn 2 


Proof From the definition and properties of product space, 
KR, = {P",C", K, En, Dn} 
also constitutes a cryptosystem. By Theorem 4.5, then 
H(K|C") = H(K) + H(P") — H(C"). 


By (3.9), we have 
H(C") <n log, |C], |C| = |P|. 


Replace information space X with P, then we have 


H(P") = H(P""') + H(P|P""') 
= Pe ae, 
> H(P'- Hs. 


So we have 
H(P") > nH% =n —r)Ho = n(1 — r) log, | P|. 


Combined with the above formula, we have an estimate 


H(K|C") > H(K) t- n(1 — r)log; |P| ^ nlog, | P|. 


Because of the definition, 
H(K|C") = — $5 3 p(ky)log, p(kly) 
yeC" keK 


=- 9 pv) Y p(kly)log; p(|y) 


yec” keK 


=- >> py) YS ply) log, ply). 


yec" keK(y) 


We get 


X ply) = >> pP®aL 


ke K(y) keK 


Then by Jensen inequality, 


(4.21) 


(4.22) 


(4.23) 
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H(K|C") < 3 p(y) logy kG)] 


yec” 
log; $^ pO)IkG) 
yec” 


= log, (Sa + 1). 


IA 


Finally, (4.21) can be obtained from form (4.23) to complete the proof! 


When the mathematical expectation of the number of pseudo keys is greater than 
0, the secret only attack cannot break the password in theory, so we define the unique 
solution distance of a cryptosystem as the value of n of S, = 0. 


Definition 4.5 A cryptosystem whose unique solution distance is infinite is called 
an ideal security system. 


From Theorem 4.6, we can obtain an approximate value of the distance of the 
unique solution. 
H(k) 


no —— ———. 
r log, | P| 


The unique solution distance indicates the minimum amount of cryptosystemtext 
that may be decrypted successfully when an exhaustive attack is carried out. Gener- 
ally speaking, the greater the unique solution distance, the better the confidentiality 
of the system. However, Shannon only gives the existence of the unique solution 
distance, but does not give a specific calculation program. In practice, the amount of 
cryptosystemtext required to decryptosystem a cryptosystemtext is far greater than 
the theoretical value of the unique solution distance. 


4.4 Message Authentication 


Authentication system, also known as authentication code, is an important tool to 
ensure the authenticity and integrity of messages. In 1984, Simmons systematically 
put forward the information theory of authentication system for the first time. He used 
mathematics to study the theoretical and practical security of authentication system. 
This paper puts forward the performance limit of authentication system and the 
mathematical principles that should be followed in the design of authentication code. 
Although Simmons' theory is not mature and perfect, its position in authentication 
system is as important as Shannon's theory in cryptosystem, which lays a theoretical 
foundation for the research of mathematical theory of authentication system. 

In cryptography, authentication system includes entity authentication and message 
authentication. We mainly discuss message authentication system. At present, there 
are two main models of authentication system. One is the arbiter-free authentication 
system model. In this model, the participants of the system are mainly message 
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sender, message receiver and attacker, in which the message sender and receiver trust 
each other. They share the same key information; another model is the authentication 
system model with arbiter. In this model, the participants of the system have arbiters 
in addition to the information sender, receiver and attacker. At this time, the sender 
and receiver of the message do not trust each other, but they all trust the arbiter. The 
arbiter shares the key information with the sender and receiver. 

An authentication system without privacy and confidentiality function and without 
arbiter is composed of four parts: a finite set S of source states, called the source set, 
a finite set A of authentication tags, called the tag set, a key space composed of all 
solvable keys, and an authentication rule set E = {ex(s)|k € K,s € S), where for 
any k € K, s € S, ex(s) is the authentication rule. It is a mapping of S — A. 


Definition 4.6 An authentication system is T = (S, A, K, E], where S, A, K is the 
information space, S is the source space or source set, A is the label space or label 
set, and K is the key space, where S and K are statistically independent, 


E = {e,(s)|k € K,s € S}. 


Each e; (s) is an injection of S — A, which is called an authentication rule. 


Definition 4.7 The product space SA is called the message space, and M represents 
SA. 

Authentication protocol: The sender and receiver of the message use the following 
protocol to transmit information. First, they secretly select and share the random 
key k € K; if the sender wants to transmit an information source state s € S to 
the receiver, the sender calculates a = e; (s) and sends the message sa € M to the 
receiver. When the receiver receives message sa, he calculates a’ = e; (s) again, if 
a’ = a, he confirms that the message is reliable and receives the message, otherwise 
he refuses to receive the message sa. 


Definition 4.8 Matrix [ex(s)]|x\x)s; is called authentication matrix. Its rows are 
marked by key k € K and columns by source state s € S. It is a |K| x |S|-order 
matrix, the element intersecting row k and column s is e; (s). 


Authentication matrix is an important tool in authentication theory research. Our 
detailed list is as follows: 
Let K = (ki, ko, ..., kn}, S = (51,55, ..., Sm}. Then the authentication matrix is an 
n x m-order matrix, which is listed as follows: 


eki (S1) €x (82) +++ ex (Sm) 
€i, (S1) €x, C52) +++ e, (Sm) 


Ek, (51) Ek, (82) ^ * * ex, Sm) |, 
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4.5 Forgery Attack 


In the process of message authentication, the attacker is an intermediate intruder. 
We usually consider two types of attacks, one is forgery attack and the other is 
substitution attack, which correspond to secret only attack and plaintext attack in 
cryptosystem. In forgery attack, the attacker sends message sa € M in the channel 
and wants the receiver to confirm that it is true and receive it; in the substitution 
attack, the attacker first observes a message sa € M in the channel, so he analyzes 
the coding rules currently used, then he tampers the message sa with s’a’ € M, where 
5' Æ s, and wants the receiver to receive it as a real message. 

We assume that the attacker adopts the optimal deception strategy. py, represents 
the probability that the forgery attacker is most likely to succeed in deception, and 
Pa, represents the probability that the attacker is most likely to succeed in deception. 
The probability p4 that the attacker is successful in deception is defined as 


Pa = max{ Pa, Pa, }. (4.24) 


Simmons’ theory is mainly to estimate the lower bound of pag, so as to provide a 
theoretical basis for constructing authentication codes with attack success probability 
Pa as small as possible. 

First, let's look at the definition and estimation of the maximum probability pa, 
of successful deception by forgery attackers. 

A = (a1, a2,...,a,} represents the authentication tag space. The attacker first 
selects a source state s € S and an authentication tag a € A. Let ko € K represent 
the shared key selected by the sender and receiver, if a = eg (s), the forgery attacker 
can successfully deceive the receiver. We use pay off (s, a) to representthe probability 
that the message receiver receives sa as a true message, that is 


pay off (s, a) = p(a = eg (s)) = x p(k). (4.25) 


keK 
ex (s)=a 


If the attacker adopts the optimal strategy, then 
pa, = Max{pay off (s,a)|s € S,a € A}. (4.26) 


Theorem 4.7 Ifthe scale of authentication tag space A is set to |A| =r, for any 
fixed source state s € S, there will always be an authentication tag a € A such that 


1 1 


pay off (s, a) = —5 thus Ph > E 
r r 


Proof By the definition of pay off (s, a), 


166 4 Cryptosystem and Authentication System 


X pay off (s.a) 2 35 Y pk). 


acA acA kek 
ex (s)=a 


When a runs through the s column of the authentication matrix, k traverses the whole 
key space, so 


X pay off (s, a) = X pk) =1. 


acA keK 


Therefore, there is at least one a € A such that 


1 1 
pay off (s, a) > A m 


Theorem 4.8 Let T — (S, A, K, E] bea message authentication system, 
Pa, = Max{pay off (s, a)|s E€ S,a € A} 
is the maximum probability of successful forgery attack, then 
log; pa, 2 H(K|SA) — H(K) 


and 


1 
Ph = J H(K)—HIKISA) ” 


Proof By definition, we know that pa, is not less than the mathematical expectation 
of pay off (s, a), that is 


Pad X P9 p(sa)pay off (s, a). 


s€S,acA 


Then by Jensen inequality, we have 


log, pa, = log, p» p(sa)pay off (s, a) 


seS,acA 
> M p(sa) log, pay off (s, a). 
seS,acA 
Obviously, 
pay off (s, a) = p(als). 
Thus 


p(sa) = p(s)p(a|s) = p(s)pay off (s, a). (4.27) 
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So 
log, pa, = X Y | psa) log, pay off (s, a) 


seS acA 


= Y písa) log, p(als) 


seS acA 
= —H(A|S). 
Because the source space S and the key space K are statistically independent, so 


H(SK) = H(K) + H(S). 


Also, the tag space A is completely determined by the source space S and the key 
space K, so 
H(A|K S) — 0. 


By the addition formula of information space, 


H(KAS) = H(AS) + H(K|AS) 
= H(S) + H(AIS) + H(K|AS). 


On the other hand, 


H(KAS) = H(KS) + H(AIKS) 
= H(KS) = H(K) + H(S). 


On the whole, we have 
—H(A|S) = H(K|AS) — H(K). 


Thus 
log, pa, = —H(A|S) = H(K|AS) — H(K). 


We completed the proof of the theorem. 


M = SA is called message space, it can be seen from theorem 4.8 that the maxi- 
mum success probability py, of forgery attack satisfies 


1 
Pas Z JKM’ 


where I (K, M) is the average amount of mutual information between the key space 
and the information space. If the amount of mutual information 7 (K, M) is larger, 
the probability of the most successful forgery attack is lower. On the contrary, if the 
amount of mutual information is smaller, the success rate of forgery attack is higher. 
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4.6 Substitute Attack 


The so-called substitution attack is that the attacker first observes a message (5, a) on 
the message, and then replaces (s, a) with message (s’, a^), hoping that the receiver 
will receive (s’, a^) as a real message. Considering the maximum success probability 
Pa, Of substitution attack, it is more difficult than forgery attack, the main reason is 
that pa, depends on both the probability distribution of source state space S and the 
probability distribution of key space K. 

Let (s’, a’) and (s, a) be two messages, where s  s’. We use pay off (s’, a’, s, a) 
to express the probability that using (s’, a^) instead of (s, a) can cheat success, then 


pay off (s', a^, s, a) = pla’ = eu G)la = ex), ko € K. 


The above formula represents the conditional probability of a’ = ej, (s) under the 
condition of a = e% (s) under the same key Ko, so 


p(a' = eus), a = ex(s)) 


pla = ex (s)) 
M (4.28) 


pay off (s', a’, s,a) = 


eK, 
exo (5) —a' ekg (s)—a 


pay off (s, a) 


When the message (s, a) € M is given, the attacker uses the optimal strategy to 
maximize the success probability of the deceiver, so let 


Ps = max{pay off (s’, a’, s,a)|s’ € S, s' Z s,a' € A}, (4.29) 


Taking p; ;, as a random variable, its mathematical expectation on message set M = 
SA is 
pa D. pGa)psa. (4.30) 


seS,acA 


The above formula is the formal definition of py,, which is the weighted average of 
the maximum success probability of pay off (s’, a’, s, a) in message space M. 
Like Theorem 4.7, we have 


Theorem 4.9 Let T = (S, A, K, E] be an authentication code, |A| =r, then for 
any given s' € S, s € S, s z:s' anda € A, there is a label a’ € A such that 


1 1 
pay off (s',a',s,a) > —=-. 
[A] r 
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So we have 


— 


Pa Z =: 


~ 


Proof By (4.28), 


1 
ff A A , = LIU ER Xm. k 
2 pay off (s'asa = — 2.4 2, PO 


d'EA cor dud 
ex (s)—a,e, (s) —a' 
1 
CUN M 5 p(k) = 1. 
pay off (s, a) wd 


ey(s)—a 


So at least one a’ € A such that 
ay off (s', a’ )> : 
S.,a,s,d) > —=-. 
pay ial s 


By the definition of p, a, for V s € S anda € A, we have 


p = Å 
pez T F 


Thus i 1 
paz 2, P(sa)Psa > =) | pa) = =. 


seS,acA acA 


Theorem 4.10 Let T — (S, A, K, E] beanauthentication code, for any (s, a) € M, 
when using (s', a^) instead of attack, let pg, be the mathematical expectation of Ps. 
in space M, then 

log; pa, = H(K|M^) — H(K|M) 


and 


1 


Pa = SHRIMP) rap 


Proof By (4.29), Ps a will not be less than the mathematical expectation of pay off 
(s, a/, s, a) ons’ € S, a/ € A, that is 


Psa = >. pG'a|sa)pay off (s', a’, s, a). 


s'€S,a'cA 


By (4.30) and Jensen inequality, we have 
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logs pa =) p(Ga)log,psa 


seS,acA 
> M psa) V p(s‘a'|sa) log, pay off (s', a', s, a) 
seS,acA s'€S,a'cA 


p» b» p(sas'a^) log, pay off (s', a’, s, a) 


s€S,acA s'eS,a'eA 


3 M p(sas’a’) log, p(a's'|as) 


SES, AEA s'eS,a'cA 


= —H(M|M). 


In addition, 
H(K M?) = H(M|M) + H(K|M?) 
= H(K|M)+ H(M|KM). 
So there are 


—H(M|M) = H(K|M?) — H(K|M) — H(M|K M). 


It can be proved that 


H(M|KM) = 0. 
So there are 
—H(M|M) = H(K|M?) — H(K|M). 
Thus 
log; pa, => H(K|M?) — H(K|M). 
That is 
1 
Pa = AT) HIM) 
The Theorem holds! 


Definition 4.9 An authentication code (S, A, K, E} is called perfect if 


pa = 2! KWo-nao. 


Theorem 4.11 Perfect certification system exists. 


Proof The theorem is proved directly by the construction method. First, let the source 
state space be S = (0, 1}. Let N bea positive even number, and define the label space 
A and the key space K as follows: 


Si 


N 
A = Z; = {am -ay |a; E€ Z2,1 <i < 


} 
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and 
K = ZY = (ko: kylki € Zp, 1 <i € N}. 


The authentication rule e; (s) determined by k = k,k2-- ‘kyk Naess ky is defined 
as 


ey (0) = Kiko ++ ky 


and 
e(l) = kway eo ky. 


Assuming that all 2" keys k are equitably selected, so for s € S and a € A, we have 


pay off (s,a) = p(a = e(s)) 2273. 


So there is pa, = 277, similarly to pa, = Dru. so 


Pa =2 


Easy to calculate 
N 
H(K|M) — H(K) = des —H(K|M). 


So 


Therefore, (S, A, K, E} is a perfect authentication system. 


4.7 Basic Algorithm 


4.7.1 Affine Transformation 


Encryption with matrix comes from the classical Vigenère password. Let X = 
(a1, a2, ..., an} be a plaintext alphabet of N characters, we replace the characters 
in Zy and X with numerical values, where Zy is the remaining class ring of mod N. 
Let P — zt. be the plaintext space, x = x1x2--- xy € P is called a plaintext unit 
or a plaintext message of length k. Let Mj (Zy) be a k-order full matrix ring over 
Zy, A € M,(Zy) is a invertible matrix of order k, b = bib; <- -bg € IA is a given 
directional quantity, each plaintext unit x = x4x25--- x, in P is encrypted by affine 
transformation (A, b): 
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Xj X| bi 
i x2 b; 

.I-Al.]I-«l. [. (4.32) 
x; Xk by 


where x = xix» -- xy is clear text, x = "us z Me is cryptosystemtext. The decryp- 
tion algorithm is: 


XI xi bi 
X2 x bz 

=A! ex xu (4.33) 
Xk X, by 


Because affine transformation (A, b) is a 1-1 correspondence on ZK, — Zk, 
its inverse transformation is (A^!, —A~'b); therefore, using affine transformation 
(A, b), we obtain the so-called high-order affine cryptosystem. This cryptosystem 
was first proposed by mathematician Lester hill in American Mathematics monthly 
in 1929, so it is also called Hill cryptosystem. 

Hill cryptosystem divides the plaintext into a group of k characters and then 
encrypts each plaintext unit in turn by using k-order affine transformation (A, b) on 
Zn. The advantage of this password is that it hides the statistical characteristics of a 
single character (such as 26 letters in English), which can better resist the statistical 
analysis of the occurrence frequency of characters, and has strong ability to resist 
cryptosystemtext only attacks. However, on the basis of mastering a large amount of 
plaintext, it is not difficult to find the key (A, b), so the hill password is not strong 
against the attack of known plaintext. 

The mathematical principles used by Hill cryptosystem are the following two 
conclusions. 


Lemma 4.1 The set of all k-order affine transformations on Zy is written as Gx, 
that is 


Gy = ((A, D)| Ais ak -order reversible square matrix, b € zd 


Then G; forms a group under the multiplication of transformation, which is called 
the k-order affine transformation group of ring Zy. 


Proof Take A as the k-order identity matrix E and b = 0 as the k-dimensional zero 
vector, then (E, 0) is the identity transformation of Zk, — Zz and the unit element 
of G,. Secondly, we look at the product of two affine transformations (A, b;) and 
(A2, b2), 

(Ai, b1) (A2, b2) = (A142, A1b2 + b1) € Gy. 


Obviously, the inverse transformation of (A, b) is 
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(A, b)! = (A7, -A7!b) € Gy. 
Therefore, Gg is a group. The Lemma holds. 


From the above lemma, any group element (A, b) € G; of affine transformation 
group will form a Hill cryptosystem. If we select n group elements (A, b1), (A2, b2), 
. +5 (An, bn) in Gy and let 


(A, b) = | [(Ai. bi). 
i=1 


Using (A, b) to encrypt, we get a more complex Hill cryptosystem. 


Lemma 4.2 A € M;(Zy), |A| = D is the determinant of A, then A is reversible if 
and only if D and N are coprime, that is (D, N) = 1. 


Proof If (D, N) = 1, then there is D, such that D; D = 1(mod N), let 


Aij A2 ++ Arg 
ep c s (4.34) 
Arı Ago +++ Akg 


where A = (aij)kxk, Aij is the algebraic cofactor of aj;. obviously, we have 


1-0 
00-.-.1 


So A is reversible, A^! = A*. Let's take k — 2 as an example, if |A| = D, 


(D,N)=1, 
= ab ie Did —D,b 
A- (21) 4 =( 3. ze? 


Conversely, if A is reversible and A^! is the inverse matrix, because A^! A = 
AA-! = E, we get 


|AA!| = |A||A7 | = (mod N). 
So we have (D, N) = 1. The Lemma holds. 


If k = 1, first-order affine cryptosystem x =ax+ b(mod N), where (a, N) — 1, 
contains many famous classical passwords, especially when a — 1, b — 3, N — 26, 
X = x + 3(mod 26) is the famous Caesar code in history. 

Next, we analyze the computational complexity of affine cryptography. We have 
the following Lemma. 
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Lemma 4.3 If A = (aij)kxy is a k-order reversible square matrix on Zy, the bit 
operation times of A^! are estimated as follows 


Time(A~!) = O(k*k! log? N). 


Therefore, when k is a fixed constant, the algorithm for finding A^! is polynomial. 
When N is a fixed constant, the algorithm for finding A^! is exponential. In other 
words, the greater the order of the matrix, the higher the computational complexity. 


Proof Because A = (aij) is reversible, then determinant 


D-|AI- Y CD Gij joa aa; akh 
Aja 


where jıj2--- jy is an arrangement of 1,2,..., k and t(ji jo--- ji) is the reverse 
order number of the arrangement. The number of bit operations of each summation 
is O (2 log? N), and there are k! summation terms in total, thus 


Time(D) = O(K^k!log? N). 


By Lemma 1.5 of Chapter 1, find the multiplicative inverse of D under mod N, 
D! mod N = Dj is 
Time(D,) = O(log? N). 


The bit operation times of each algebraic cofactor A;; of the adjoint matrix A* of 
formula (4.34) is O((k — 1)? (k — 1)! log? N), and there are k? algebraic cofactors, 
thus 

Time(A*) = O(k*k! log” N). 


So 
Time(A^!) = O(k*k! log? N). 


When k is constant, the algorithm for finding A^! is polynomial. When N is constant 
and k — oo, it is obvious that the algorithm for finding A^! is exponential. The 
Lemma holds. 


4.7.0 RSA 


In 1976, two mathematicians from Stanford University, Diffie and Hellman, put 
forward a new idea of cryptosystem design. In short, the encryption algorithm and 
decryption algorithm are designed based on the principle of asymmetry. We can use 
the following schematic diagram to illustrate 
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"NM 
P^LCP, (4.35) 


where P is the plaintext space, C is the cryptosystemtext space, f encryption algo- 
rithm and f~! decryption algorithm. If f and f7! are the same algorithm, such as 
the involution operation in binary system, or the encryption algorithm f can easily 
deduce the decryption algorithm f—!. For example, the matrix encryption algorithm 
mentioned in the previous section (the matrix order is very small), which is called 
symmetric cryptosystem. The essence of symmetric cryptosystem is that the encryp- 
tion key and decryption key have the same confidentiality importance. Diffie and 
Hellman proposed that if f 4 f7! and f are encryption algorithms that are easy 
to implement, while f7! is a decryption algorithm that is very difficult to calcu- 
late, the key can be divided into encryption key and decryption key. Even if the 
encryption key is made public to the public, the security of decryption key will not 
be affected. This encryption algorithm f is called asymmetric or trapdoor one-way 
function. The password using asymmetric f is called asymmetric password or public 
key cryptosystem. Due to the bold innovation of Diffie and Hellman, cryptography 
has ushered in a new era—the era of public key cryptography. Its basic feature is 
that passwords change from few users to many users, which greatly improves the 
efficiency and social value of passwords. 

How to design asymmetric encryption algorithm? Rivest, Shamir and Adleman 
jointly put forward the first secure and practical one-way encryption algorithm, which 
is called RSA algorithm in academic circles. This public key cryptosystem has been 
widely used in cryptographic design and has become an international standard algo- 
rithm. In addition to its simplicity and practicality, its security completely depends 
on the difficulty of large prime factorization of huge integers. 

Let p, q be two large and relatively safe prime numbers, assume 


10° < p,q, its binary digits k > 1024bits. (4.36) 
Let n = pq, y(n) be an Euler function, then 
qn) —(p-D(q-D-n-tl-p-q4. 
Randomly select a positive integer e to satisfy 
1 «e < e(n), (e, o(n)) = I. (4.37) 
The large prime numbers p and q and e satisfying formula (4.37) are randomly 
generated. The so-called random generation is to randomly select the p,q and e 


with the help of the computer random number generator (or pseudo-random number 
generator), and its computational complexity is 


Lemma 4.4 Randomly generated large prime number p and q, n = pq, y(n) is 
Euler function, 1 < e < y(n), (e, g(n)) = 1, then 


176 4 Cryptosystem and Authentication System 


Time (select out n) = O (log* n), 
Time (find e) = O (log? n). 


Proof Use the random number generator to generate a huge integer m, such as 
m > 10°, and then detect whether m, m +1,m-+2,...,isa prime number. From 
the prime number theorem, we know that the frequency of prime numbers adjacent 
to m is about O Gars so we only need about O(log m) tests to find the required 
prime number p, by Lemma 1.5 of Chapter 1, 


Time (find prime p) = O(log? m) = O(log” n). 


Similarly, 
Time (find prime q) = O (log? n). 


Because n = pq, so 
Time (select out n) = O (log* n). 


n after confirmation, y(n) = (p — 1)(q — 1). A positive integer a, 1 < a < y(n), 
is randomly generated by the random number generator, and then whether a, a + 
l,a 4-2, ... and g(n) are mutually prime is detected in turn. Again, according to the 
prime number theorem, the frequency of the prime factor of y(n) appearing in the 
vicinity of a is Ot zd. so we only need O (log a) tests to get the required e. Thus 


Time (select out e) = O (log? a) — O(log? n). 


The Lemma holds. 
After randomly selecting p, q, n = pq, and e, because (e, g(n)) = 1, then exist 
d = e^! mod y(n), that is 
de = |(mod g(n)), 1 < d < g(n). (4.38) 
Definition 4.10 After randomly determining n = pq, let P, = (n, e) be called pub- 
lic key, P4 = (n, d) be called private key, or e be public key and d be private key. 


By Lemma 1.5 of chapter 1, calculate the number Time(d) — O(log? g(n)) = 
O (log? n) of bit operations required for d = e^! mod y(n). By Lemma 4.4, we have 


Corollary 4.3 The computational complexity of randomly generated public key 
P, — (n, e) and private key P4 — (n, d) is polynomial. 


The key mathematical principle used in RSA cryptographic design is the general- 
ized Euler congruence theorem. n > 1 is a positive integer, (m,n) = 1, from Euler 
theorem, it can be seen that 


m?” = 1(mod n), => m?" *! = m(mod n). (4.39) 
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We will prove that under the condition that n is a positive integer without square 
factor, there is formula (4.39) for all positive integers m, whether (m, n) = 1 or 
(m, n) > 1. 


Lemma 4.5 /f n — pq is the product of two different prime numbers, then for all 
positive integers m, k, there are 


m"? *! = m(mod n). (4.40) 
Proof If (m, n) = 1, then by Euler Theorem, 
m" *? = 1(mod n), => m? *! = m(modn n). 


We only consider the case of (m, n) > 1, because n = pq, so (m, n) = p, (m, n) = 
q, or (m, n) = n. If (m, n) = n,then(4.40) holds. Might as well let (m, n) = p, then 
m = pt, where 1 < t < q. By Euler theorem, because (m, q) = 1, so 


m?? = | (mod q), => mP) = 1 (mod q). 


For V k > 1, there is 
m*? = 1(modq). 


We write 
me — pg +1. 


Both sides are multiplied by m, 


m9 0*1 — rtg E qm, 


The above formula contains 
m"? *! = m(mod n). 


We have completed the proof of lemma. 


With the above preparations, the workflow of RSA password can be divided into 
the following three steps: 


(1) Suppose A is a user of RSA, and A randomly generates two huge prime num- 
bers p = p(A), q = q(A), n = n(A), where n = pq, y(n) = (p — DQ — 1). 
Then randomly generate positive integers e = e(A), satisfies 1 < e < y(n), 
(e, p(n)) = 1, calculated d = e^!(modq(n)), and 1 < d < q(n). User A 
destroys two prime numbers p and q, and only keeps three numbers n, e, d, 
after publishing P, = (n, e) as public key, he has private key Py = (n, d) and 
keeps it strictly confidential. 
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(2) User B of another RSA sends encrypted information to user A using the known 
public key (n, e) of user A. B selects P = Z, as the plaintext space and encrypts 
each m € Z,. The encryption algorithm c = f (m) is defined as 


c= f(m) = m*(modn), 1 xc <n. (4.41) 


where c is cryptosystemtext. 
(3) After receiving the cryptosystemtext c sent by user B, user A decrypts it with 
its own private key (n, d). Decryption algorithm f~! is defined as: 


m = f (c) = c'(modn), 1 E m <n. (4.42) 


User A gets the plaintext m sent by user B. so far, RSA cryptosystem completes 
encryption and decryption. 


The correctness and uniqueness of RSA password are guaranteed by the following 
Lemma. 


Lemma 4.6 The encryption algorithm f defined by equation (4.41) is a 1—1 corre- 
spondence of Z, —> Zn, and f! defined by equation (4.42) is the inverse mapping 


of f. 
Proof By Lemma 4.5, for all m € Z,, k is a positive integer, then there is 
m * 0*1! 2 m(mod n). 


Because of ed = 1(mod g(n)), we can write 


ed = ke(n) +1. 
By (4.41), then there is 
cf = md = mt! = m(modn). 
That is to say, for all m € Zp, 
fq) =m. 


In the same way, we have 


e ed = komt 


m^ zc = c(mod n). 


In other words, 


Tor (ues 


By Lemma 1.1 of Chap. 1, f is a 1-1 correspondence of Z, — Z,, and ff^! = 
1, fT! f = 1. Th Lemma holds. 


4.7 Basic Algorithm 179 


Another very important application of RSA is for digital signature. From the 
workflow of RSA password, it can be seen that the encryption algorithm defined in 
formula (4.41) is based on the public key (n4, e4) of user A, and we denote f as 
fa and the decryption algorithm defined in formula (4.42) as frs The workflow 
of RSA digital signature is: User A sends his digital signature to user B, that is, A 
sends an encrypted message to B. Let P,(A) = (n4, ea) be the public key of A and 
P(A) = (na, da) the private key of A. Similarly, P.(B) = (np, eg) is the public 
key of B and P(B) = (ng, dg) is the private key of B. Then the digital signature 
sent by user A to user B is 


fa fa (m), if na < ng 


e ; (4.43) 

fa fem), ifn, > neg. 
where m € Z,, is the digital signature published by user A. After receiving the 
above digital signature of user A, user B adopts the following two different digital 
verification according to the two cases of n4 « ng and n4 > np, formula (4.43) is 
the real signature of user A. 


(i) Ifn4 < ng, user B first decrypts with his private key fj ! = (ng, dg) and then 
decrypts with user A's public key fa = (r4, eA), the verification is as follows 


fata (afa (m)) = fafa (m) =m. 


(i) Ifn, > ng, user B uses user A's public key f4 = (n4, eA) first, then decrypt 
and verify with your own private key fg ! — (ng, dg) 


fs fafa fa(m)) = fg! fa(m) = m. 


The security of RSA is the difficulty of large prime factorization based on n. When 
all users select the large prime numbers p and q, let n — pq, then destroy p and 
q, only (n, e) and its own secret (n, d) key information are retained, even if (n, e) 
is published to the public, outsiders only know n and do not know q(n), so they 
cannot obtain the information of private key (n, d). Because the calculation of p(n) 
must rely on the prime factorization of n, from the product formula of Euler, it is not 
difficult to see 


1 
Go) =| [bey 


pln 


Because we have very little knowledge of prime numbers, we have not found a general 
term formula to give an infinite number of prime numbers, so it is undoubtedly a 
difficult problem to judge whether a huge integer n is prime, not to mention the prime 
factorization of n. 
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4.7.3 Discrete Logarithm 


Let G bea finite group and b, y € G be two group elements of G, let t be the minimum 
positive integer satisfying b/ = 1,1 is called the order of b, denote as t = o(b). If there 
isone x, 1 € x < o(b) such that y = b*, x is called the discrete logarithm of y under 
base b. Known b € G, 0 x x x o(b), it's easy to calculate y = b*. Conversely, for 
any group element y, it is very difficult to find the discrete logarithm of y under base 
b. ' Therefore, using discrete logarithm to encrypt has become the most mainstream 
encryption algorithm in public key cryptosystem, including the famous ElGamal 
cryptosystem and elliptic curve cryptosystem. ElGamal cryptosystem uses the dis- 
crete logarithm on the multiplication group formed by all k of nonzero elements 
in finite field F,. Elliptic curve cryptography uses the discrete logarithm algorithm 
of Mordell group on elliptic curve. Here we mainly discuss ElGamal cryptography, 
and elliptic curve cryptography is discussed in Chap. 6. We first prove several basic 
conclusions in finite field. 


Lemma 4.7 Let F, be a finite field of q elements and q = p" be the power of 
prime p. F7 = Fq\{0} is all the nonzero elements in Fg, then F} is a cyclic group of 
order (q — 1) under multiplication, and the generating element g of F} is called the 
generator of finite field F}. 


Proof According to Lagrange theorem, the number of zeros of polynomials in any 
field is not greater than the degree of polynomials. The finite field F7 is a finite group 
of order (q — 1) under multiplication. To prove that F* is a cyclic group, it is only 
proved that for any factor d of g — 1, d|q — 1, the number of solutions of equation 
x?=1in F% is not greater than d. This point can be deduced from Lagrange’s 
theorem, because the number of zeros of polynomial x“ — 1 in the whole field F, 
is not greater than d, so the number of zeros in F7 is not greater than d. So F7 is a 
finite cyclic group. The Lemma holds. 


Lemma 4.8 Let F, be a q-element finite field, q = p", F, C F, b a subfield, F^, < 
Le a a subgroup of F}, if g is the generator of F}, then g' = gri is the generator 
of F5. 

Proof g is the generator of F7, then o(g) = q — 1. Let g' = g^, then 


o(g) = os) =p-—l. 


ua 
(q — 1, 4) 


Thus (g’)?~! = 1, thatis (g^)? = g',sog' € F,. Because F5 is acyclic group of order 
p — 1, and o(g’) = p — 1, so F7, =< g' >, g’ is the generator of F}. The Lemma 
holds. 


Lemma 4.9 Let F, be a q-element finite field, q = p", for any d\n, let 
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Aq = {p(x) € F,[x]| deg p(x) = d, p(x) is an irreducible monic polynomial} 


and 


fax) = [[ rw. 


p(x)eAa 


Then we have 


xf —x =x" —x=]] ful). (4.44) 


d\n 


Proof We know 
d n 
x? —x|x? — x € djn. 


Let p(x) € Ag, that is p(x) € F,[x], deg p(x) = d, p(x) is an irreducible monic 
polynomial. Let o be a root of p(x), then add a finite extension field of œ on F, and 
FF, (a) is a d-th finite extension on F,. If d|n, then 


F,(@@) = Fp C F}, 


so there is œ € F}. Because the zeros of p(x) are all in F}, so there is p(x)|x? — x. 
Any p(x) in Ag has p(x)|x? — x, so 


fax)= [| »e. fico? - x. 
p(x)eAa 


Conversely, p(x) is the first irreducible polynomial, and deg p(x) = d. If p(x)|x? — 
x, then the zeros of p(x) are all in F,. Let o be a zero point of p(x), then there is 
F,(a@) C F}, that is Fpa C Fy = Fp, so d|n. Finally, 


xf—x= I] fa(x). 
d|n 
The Lemma holds. 


Lemma 4.10 N,(d) represents the number of the first irreducible polynomial with 
degree d in F ,[x], then 


1 n 
Np(n) — — 5 upt, (4.45) 
d|n 


where u is Mobius function. 


Proof By Lemma 4.9 and (4.44), 


x! -x zx" —x=]|[ fai). 


d|n 
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Comparing the degree of polynomials on both sides, there is 


p" =) dN, (d). 


d|n 


By the Móbius inverse formula, 


nN,(n) = 3 u(d)p*, 


d|n 
so there is (4.45), the Lemma holds. 


Corollary 4.4 If d is a prime number, the degree in F ,[x] is d and the number of 
the first irreducible polynomial is +(p4 — p), that is 


1 
N,(d) = rial — p), ifd is a prime number. 


Proof By (4.45), 
1 d 
Np(d) = 7? up? 
ó|d 
ld 
= Fitz -= p). 
The Corollary holds. 


Based on the above basic conclusions about finite fields, we introduce two methods 
for solving discrete logarithms. The first is the Silver-Pohlig-Hellman smoothing 
method, and the second is the so-called exponential integration method. 

Silver—Pohlig—Hellman 

Let F, be a q-element finite field, b is the generator, that is F7 =< b >, 


o(b) = |F = 4 — 1 = pr py ps", (4.46) 


where pj is a different prime number. p for each prime factor of q — 1, p|q — 1, if 
p is relatively "small", the positive integer q — 1 is called a smooth positive integer. 
Under the condition that q — 1 is smooth, for each prime factor p, calculate all p-th 
unit roots rp, j in F7, where 


j(q-1) 


ryj—-b*,lsjesp. (4.47) 


* 


Denote R(p) = íry,j|l < j < p} is the root of p p subunits in F7, then in F7, we 
get a unit root table R. 


unit root table R = (R(pi), R(p2),..., R(p,)]. (4.48) 
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Now let's look at the calculation method of discrete logarithm in F7. Let y € F7, the 
discrete logarithm of y under base b is m, that is y = b". When y and b are given, the 
value of m is desired (1 < m < q — 1), by the prime factor decomposition of q — 1 
of formula (4.46), if for each g (1 <i € s), the minimum nonnegative residue of 
m under mod p;" is m; = m mod p?', according to the Chinese remainder theorem, 
there is a unique m mod q — 1 such that 


n= m;(mod(p;")), V iil<i<s. 


Therefore, the discrete logarithm m of y is determined. Now the question is: let 
P*|\|q — 1, we determine m mod p“. Let 


m mod p* = mo + mip + mop) +--+ + maip, 0 <m; «p 


A is the minimum nonnegative residue of m mod p^, let's determine each m;. First, 
we calculate mo. Because y = b”, so 


q-1 m(-1) moa- 
y? =b r? = P 


That is, y 7 is a unit root in F^. compare the unit root table R in F*, then we 
have mo = j,1 < j < p, which determines mo. Next, calculate my, let y; = 55 = 


b". therefore, the discrete logarithm of y; is m — mo, and 
— 2 o—1 a 
m —mozmip-4 mop +: +m-ıp™ (mod p^), 


SO 
(m—mg)(g—1) maD 
b » 


ra SS 
yi = b p^ = 


à 
in other words, y” is a p subunit root of F7, comparing the unit root table R, we 
can determine mı. Continuing with this method, we can calculate m», ..., Ma—1 in 
turn, so m mod p“ is calculated, then by the Chinese remainder theorem, the discrete 
logarithm m of y under b is calculated. 


Exponential integral method 

Let F; be the finite field of q element, q = p", p be a relatively small prime 
number, and n be a large positive integer, so that the security of q can meet certain 
requirements. Let F, be the finite field of p element, we can think of IF, as an 
n-th extension field of F,, according to the finite extension theory of the field, F, 
equivalent to (isomorphism) a quotient ring of polynomial ring FF, [T] over F,. Let 
f(T) € F [T] be the first irreducible polynomial of n degree, then 


F; = F,[T]/-rr)- = (ao +aıT +--+ + aoT"-!|V aj € Fp}. (4.49) 
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Therefore, any element a in IF, is equivalent to a polynomial a(T) on Fp, where 
dega(T) < n — 1. Let b € F; be the generator of F}, b = b(T), if a; € F, isa 
constant polynomial, ao is called a constant in F,. 

By Lemma 4.8, the discrete logarithm of the constant in IF, can be easily deter- 
mined. Let b’ € F, be the generator of F7, if m' is the discrete logarithm of constant 
do € F, to base b’, then by Lemma 4.8, m = m! is the discrete logarithm of 
do € F} under base b. Take m'(ao) as the discrete logarithm of ao under base b’, 
since p is small, we can easily calculate and list the discrete logarithms of all con- 


stants in F4: 


q—1 


Lo = {m (ao) 
p-1 


lao € Fp}. (4.50) 


Next, we determine the discrete logarithm of a nonconstant polynomial under base 
b(T). Let 1 < m < n, define 


Lm = (p(x) € F [x]| p(x) is monic irreducible polynomial, deg p(x) < m}, 
(4.51) 
The number of irreducible polynomials in Lm is written as Am, that is |Lj,| = hm. 
We first calculate the discrete exponent of irreducible polynomials in Ln. 
Let b = b(T) be the generator of F}, b(T) € F,[T], deg b(T) < n — 1, obviously, 
when t runs through all positive integers from 1 to q — 1, b‘(T) runs through all 
nonzero polynomials in Eq. (4.49). Appropriate choice f, let 


b' (T) =c(T)(mod f(T)), degc(T) <n—1. 
Such that 
c(T)=co || p(T), 


p(T)eLyn 


denote the discrete logarithm of a(T) under b(T) with ind(a(T)), which can be 
obtained from the above formula, 


ind(c(T)) — indc) 2. J` o. ,ind(p(T)) (modq — 1). 


P(T)ELm 
Because of ind(c(T)) = t, thus, 
t —ind(co) = > Q,pind(p(T)) (mod q — 1). (4.52) 
P(T)ELm 


By (4.50), ind(co) is known, therefore, the above formula is a linear equation with 
hm variables ind(p(T)). By continuously selecting the appropriate t, we can obtain 
hm independent linear equations, that is, the Am x h,,-order matrix formed by the 
coefficients of hm variables and hn linear equations is reversible under mod q — 1, by 
Lemma 4.2, as long as its determinant and q — 1 are coprime. From the knowledge of 
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linear algebra, we can calculate all ind(p(T)) by solving the above linear equations, 
the following exponential integral table B,, is obtained, 


B, = {ind(p(T))|p(T) € Lin}. (4.53) 


With exponential integral table Bm, the discrete logarithm of any element a(T) € 
F can be easily calculated. Let a4 (T) = a(T)b(T)', select the appropriate f such 
that 
ai(T) = a I] p(T)“ (mod f (T)). 


D(T)ELm 


Once the decomposition is established, there are 


ind(a; (7)) = ind(ao) + > a ind(p(T)). 
P(T)ELm 


Thus 
ind(a(T)) = ind(a,(T)) — t. 


The discrete logarithm of a(T) is obtained. 


Remark 4.1 The key to the above calculation is to select an appropriate m to obtain 
the exponential integral table B,,. This m cannot be too large, because hm increases 
exponentially with m, for example, if m is a prime number, then by Corollary 4.4, 


1 
hyn = IL = —(p" = p). 
m 


When Am is too large and calculating the exponential integral table Bm, a matrix 
of order hm x hm will be solved, and its computational complexity is exponential. 
Obviously, m cannot be too small, the selection of m depends on p and n, when 
p —2,n = 127, m's best choice is m = 17. Select finite field F}, q = 2, because 
q—1 —2"" — 1 is a Mersenne prime. This is a popular option at present. 


ElGamal cryptosystem 

Using the computational complexity of discrete logarithm to design asymmetric 
cryptosystem is the basic idea of ElGamal cryptosystem. Each user randomly selects 
a finite field F}, g = p", p is a sufficiently large prime number, and then calculates 
the generator g of F*, select the positive integer x randomly, 1 < x < q — 1, and cal- 
culate y = g", to get the public key P; = (y, g, q), own private key Py = (x, g, q). 

Encryption algorithm: To send an encrypted message to user A, user B first corre- 
sponds each plaintext unit of plaintext space P to an element in F7, and then encrypts 
each plaintext unit. Let m € F7 be a plaintext unit, and user B randomly selects an 
integer k, 1 < k < q — 1, then, the public key (y, g, q) of user A is used to encrypt 
m, and the encryption algorithm f is 
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/ k 
—my, 


c= g*. 


f(m) = c', where (4.54) 


Get cryptosystemtext (c, c'). 
Decryption algorithm: After receiving the cryptosystemtext (c, c’) sent by user 
B, user A decrypts (c, c’) with its own private key (x, g, q), decryption algorithm 


fis 
7 Coser, (4.55) 


Lemma 4.11 The encryption algorithm f defined by Eq. (4.54) is a 1—1 correspon- 
dence of F — Fj, the inverse mapping f of f is given by equation (4.55). 
Proof By (4.54), c = g^, c' = my", then 


cc = my*g - mg“ g—*k = 


That is to say f^! (f (m)) = m, conversely, 
1 a4 Xk xk — >! 


cc * y! = cg **g c. 


that is f (f^! (c)) = c', therefore, f is the 1-1 correspondence of F; — F% and 
the inverse mapping of f is f~!. The Lemma holds. 


Finally, we discuss the computational complexity over finite fields. 


Lemma 4.12 F, is a finite field, q = p", a, B € F}, k > lisa positive integer, then 
Time(wB) = O (log? q), 


; a 3 
MA) = O(log’ q), 


Time(a*) = O(log k log? q). 


Proof Let f(x) € F [x], deg f(x) =n, f(x) is a monic irreducible polynomial, 
then 
F; = F,[x]/ « £65» = (ao ct aix eec üt lY aj € F,). 


Leta, f € Ez; then 


= ap + ax +--+ anx", B = bo + bix +- + Bax. 


The multiplication of two polynomials requires n times of mod p operation, and 
the bit operation times of each mod p operation is O(log? p), so œ -6 needs 
O(n? log” p) = O(log? q)- bit operation to get a polynomial on F plx]. The result- 
ing polynomial is divided by f(x) to obtain a polynomial of degree < n — 1, that 
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is, the final result of o - 8, the number of bit operations required for this operation is 
O(n log? p). Therefore, 


Time(of) = O(log? q 4- nlog? p) = O(log q), 


the same can be estimated Time() and Time(o^). The Lemma holds. 


4.7.4 Knapsack Problem 


Given a pile of items with different weights, can you put all or several of these items 
into a backpack to make it equal to a given weight? This is a knapsack problem arising 
from real life. Abstract into mathematical problems: Suppose A = (ao, a1, ..., Gn—1} 
are n sets of positive integers, N is a positive integer. Is N the sum of the elements 
of a subset in A? Using binary system, the knapsack problem in mathematics can be 
expressed as follows: 

Knapsack problem: When N and A = (ao, a1, ..., a4 1] given, where eacha; > 1 
is a positive integer, whether there is a binary integer e = (e, 16, 5:--e1e9)? makes 
the following formula true, 


n—l 
> eiai = N, where e; = 0ore; = 1. 
i=0 


If e exists, it is called knapsack problem (A, N) solvable, denote as y (A, N) = e. If 
N = 0, then y (A, 0) = 0 (each e; = 0) is called a trivial solution. Therefore, N > 1 
is assumed to be a positive integer. 

The above knapsack problem may have solutions, no solutions or multiple solu- 
tions. Itis very difficult to solve the general knapsack problem (A, N), which belongs 
to the “NP complete” problem. If the conjecture of “P # N P” holds, there is no gen- 
eral algorithm, and its computational complexity is polynomial of n and log N. How- 
ever, under some special conditions, such as the so-called super-increasing sequence, 
the solution of the problem will be very easy. Next, we introduce the polynomial solu- 
tion method on the premise of super-increasing sequence. 


Definition 4.11 A positive integer sequence (a;];-o is called a super-increasing 
sequence, if each a; (i > 1) is greater than the sum of the previous i positive integers, 


that is 
i-1 


ai > an 1<i «oo. (4.56) 
j=0 


The knapsack problem of super-increasing sequence is actually to find a monoton- 
ically decreasing index sequence {ix }k>0, where iy > ik+1, 0 < ig <n-— 1, Vk > 0. 
First, ip is defined as 
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io = max{ila; < N}. (4.57) 


Then consider N — aj, = 0, then the algorithm is completed, that N = a;i. If N — 


ái, > 0, then define 


i; = max(i|a; x N — dio}. 
For any k > 1, define 
iy = max{ila; < N — Qg e di, ,]- (4.58) 
If the equal sign in Eq. (4.58) holds, that is aj, = N — ai, — ::: — ai, ,, then the 
algorithm completes and obtains the solution N = aj, + aj, --- + aj, of (A, N). If iy 
does not exist, that is 


N-—dai—-:::— di, («i Vi Æ io, i,...,ix 1, 


call the algorithm terminated. Obvious indicators io > ij > +--+ > iy >---.Let I be 
a set of some indicators, and denote the above algorithm as y. 


Lemma 4.13 Let A = (ao, a1, ..., as 1] be a given set of positive integers, a;(i > 
0) is a super-increasing sequence, N is a positive integer. If there is a k > O that 
makes Vy complete atk, thatisa;, = N — aj, — +++ — aj, ,, then the knapsack problem 


(A, N) has a solution and the solution is 
V(A, N) = e = (€n—1€n-2- ++ €1€0)2; 
where 
ej =1, ifi € I, 
ej =0, ifi € I. 
If there is a k > 0, v that terminates at k, i.e., 
N—ajg—::-—a&Q,-«ai,Vié(io i... ia. 
Then the knapsack problem (A, N) has no solution. 
Proof If y is completed at k > 0, then 
N = aig tai ++ ag, 1 = fio, i1,..-, ih, 
Let e; = 1, when i € I; e; = 0, wheni ¢ I, obviously, 


n—1 


) €jdj = N. 
i=0 
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So W(A, N) = e = (€n_-1€n-2- ++ €1€0)2, if k > O exists so that y terminates at k, 
that is 
N — Qi. — +++ lip <a;,Vi ¢ {ig, i1, ..., le). 


Then the knapsack problem (A, N) has no solution. We can prove this conclusion 
by means of counter-evidence. If (A, N) has a solution, you might as well make 


N = jy tag tc. 


Adjust the order, we can let jo > jı > -+++ > j;. By the definition of io, and aj, < N, 
know jo X io, thus 


jo-1 
N > dy Zap >È ar maya, 
r=0 


contradict with N = aj, + aj, +-+ aj, so (A, N) has no solution. The Lemma 
holds. 


MH knapsack public key encryption system 

Merkle and Hellman first proposed an encryption method using knapsack problem 
in 1978, it is the first public key encryption password. Let A = (ao, a1, ..., Gn—1} 
be a sequence of super-increasing positive integers, take p, b as two prime numbers 
and satisfy 


n—i 


p> a, lzbzp-l (4.59) 
i=0 


Calculate t; = ba;(mod p), 0 <i € n — 1, then the public key is t = (tọ, t1,..., 
141), private key are A and b. 

Encryption algorithm: The plaintext space P = F5, for each plaintext unit m = 
(mom, - -- m, 4) € P, encryption algorithm 


n-l 


c= f(m) 2 Y tmi(mod p), 0< c < p, (4.60) 
i=0 


where c is cryptosystemtext. 
Decryption algorithm: First, use the private key N = b~'c(mod p), 0 < N < 
p — 1. Then use the algorithm y = f~! of knapsack problem (A, N) to solve 


£N) = (momi my) € F}, (4.61) 
to get plaintext m = (mom, -> - m, 4). 


The correctness of MH knapsack public key cryptography is attributed to the 
following Lemma. 
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Lemma 4.14 The encryption algorithm f defined by Eq. (4.60) is a 1—1 correspon- 
dence of F5 —> F,, its inverse mapping f—' is given by equation (4.61). 


Proof lf m = 0 is the zero vector in F}, then c = 0, thus N = 0. Knapsack problem 
(A, 0) has a unique trivial solution y (A, 0) = 0 e F3 is a zero vector. Therefore, the 
zero vector in [F5 is a 1-1 correspondence of the zero element in F,. Let m ¥ 0, if 


n—i 
N = b !c(mod p), c = ` timi(mod p). 
i=0 
Then 


n—1 n—1 


N= So mbt = ) miai (mod p). 
i=0 


i=0 
By (4.59)and 0 < N < p, to obtain 
n—1 


N = X mia, => V (A, N) =m = mom, -- «m, . 
i=0 


So we have 


f (fim) =m, V m eF}. 


Conversely, if 
n-l 


N= ) midi, 
i=0 


then 


n—-1 n-l 
bN = X mjajb = ) miti(mod p). 
i=0 i=0 


So there is N = b~'c(mod p), that is 
FG 7e)» =c,Vce F,. 


It can be seen that f is a 1-1 correspondence of F} —~ F, and the inverse mapping 
is fT! = y. The Lemma holds. 


It can be seen from the above discussion that if A = (a9, a1, ..., a4 1] is nota 
super-increasing sequence, the decryption algorithm f! isa difficult problem of “NP 
complete class", so the encryption and decryption algorithm defined by MH knapsack 
cryptosystem is the most typical trapdoor single function. Because of this, people 
believe that MH knapsack public key cryptography is very secure for a long time. 
However, in 1982, Shamir proved that a class of nonsuper-increasing sequences can 


4.7 Basic Algorithm 191 


be transformed into super-increasing sequences by a simple transformation x —> 
ax mod m, which can be solved by polynomial algorithm. Although this kind of 
convertible nonsuper-increasing sequence knapsack problem is quite special, it is 
enough to shake people's confidence in the security of knapsack problem public key 
cryptosystem. It is now generally accepted that knapsack public key cryptography is 
no longer secure. 


Shamir transform 
Let A; = (09,0, ..., 0, 1] is a super-increasing sequence of positive integers. 
Randomly select four positive integers m, a1, m», a», where 


n—l 
mı > 9 joi, m > nmi, (ay, mi) = (m, m) = 1. (4.62) 
i=0 


A new positive integer sequence is defined by m; and a}, 
A» = (09,01, ..., 4-1), where w; = ajo; mod mı. 


Where aœ; mod m; represents the minimum nonnegative residue of ajo; mod mı, 
that is 
0 «x oj <m, and w; = ajo; (mod m). (4.63) 


By the third sequence of positive integers is defined by m» and a», 
A3 = (ug, U1, ..., Un—1}, Ui = aoc mod m», 


that is 
0 < ui < m», uj = asc; (mod m;). (4.64) 


Because {u;} is not a super-increasing sequence, if A5 is used for encryption, it 
seems to be a general knapsack problem. Its difficulty will be NP complete, but 
Shamir transform will prove that its decryption algorithm is polynomial. 

Let x = (eg 165 2::: €1€0)2 € F} be clear text and encrypt with A5, 


n—1 


c= fœ) = > eiui, (4.65) 


i=0 


get cryptosystemtext c. If decryption is required after receiving cryptosystemtext c, it 
is a general knapsack problem, but the problem of using private key (bi, mı, bo, mz) 
will become quite simple, where 


0 < bi < Mmj, a,b, — 1(mod mı) 


0 X b» <m, abı = 1(mod m2). 
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First, note the minimum nonnegative residue of b2c under mod mz, 


n—l 
No = bye mod m = 3 > ea. (4.66) 
i=0 


Because by (4.65), 


n—1 n—i 


boc = X eiboui = Y eiw (mod m2). 


i=0 i=0 
By the assumption m» > nm of formula (4.62), and (4.63), there is 


n—1 


0x J €jQj < Mp. 
i=0 


So (4.66) holds. Then consider the minimum nonnegative residue N = bı No mod m,(0 < 
N < mj) of bj No mod m,, by (4.63), 


n-l n—-1 
N= bı No — X eibioi m eio; (mod mi). 
i=0 i=0 


So there is 
n—l 


N= o eoi. ü € Aj. 
i=0 


Since A, is a super-increasing sequence, the algorithm of polynomial (see Lemma 
4.13), we have 
v (Ai, N) = (€n—1€n—2 urs €1€0)2 =x. 


To get plaintext x. 

Therefore, Shamir uses simple transformation to transform the general knapsack 
problem into super-incremental knapsack problem. Although A3 is very special, we 
have reason to doubt that the public key cryptography based on the general knapsack 
problem solving algorithm is not as secure as people think. 


Exercise 4 


1. Explain the following terms. (1) One secret ata time, (2) Completely confidential 
system, (3) Unique solution distance, (4) Improve the certification system. 
2. Short answer: 


(1) What are the advantages and disadvantages of symmetric cryptosystem and 
asymmetric cryptosystem? 
(2) The goal of perfecting the certification system. 


47 


10. 


11. 
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. It is known that the plaintext is “Friday”, and the cryptosystemtext obtained 


after encryption with m = 2’s Hill password is “POCFKU”, find the key of Hill 
password. 


. Find the inverse matrix (mod N) of the following matrix: 


13 13 
A- |33| moas, A= i3 | moa29, 


15 17 197 62 
A= | 4 9 I A= E 5 | mod BL. 


. In number theory, Fibonacci number is defined as a, = 1, a2 = 1, a3 = 2, when 


n > l, An41 = An + a4... Prove 


Qnt41 dy | |11 ie 
An An-1 E 10 ; 


and a, is even if and only if 3|n. More generally, find the law of dla,. 


. Suppose N = mn, and (n, m) = 1. A second-order matrix A € M»(Zy) onnZy, 


can consider A € M5(Z,,) and A € M5(Z4),let Aj and A» represent the elements 
of A in M2 (Zm) and M5(Z,), then prove 


(i) Mapping A SES (A1, Az) is a 1-1 correspondence between M» (Zy) = 
M2(Zm) x M2 (Zn). 

(ii) In the corresponding o, A is the invertible matrix (mod N) if and only if A, 
is the invertible matrix (mod m) and A; are the invertible matrix (mod n). 


. Let p bea prime, œ > 1, then A € M2(Zpe) is a reversible square matrix if and 


only if A € M2(Z,) is a reversible square matrix. By calculate, for V œ > 1, find 
the number of reversible matrices in M2 (Z pe). 


. Let g(N) be Euler function, g2(N) is the number of invertible matrices in 


M» (Zy), calculation formula for q» (N): that is, write a formula for q» (NV) similar 
to e(N). Known g(N) = N IL. — 5) solve g2 (N) =? 


. Let g (N) be the number of k-order reversible matrices in Mj (Zy) and give the 


calculation formula of (N). 

According to exercise 8 and exercise 9, find the order of k-dimensional affine 
transformation group G = (A, b) on Zy. 

RSA is used for encryption, the alphabet of plaintext and cryptosystemtext 
is (0,1, 2, ..., 39) 40 numbers, of which (0, 1,2,...,25] 26 numbers are 
equivalent to English 26 letters. Blank — 26, e — 27, ? — 28, $ — 29, number 
(0,1,2,...,9] = (30, 31, ..., 39). Suppose all public keys n4 satisfy 40? < 
na < 40°. Plaintext unit m = m,m» € Zo cryptosystemtext unit c = C1C2C3 € 
Zy: For any plaintext unit, m = mm» corresponds to a number m240 + m, of 
Zn,» any cryptosystemtext c = c340? + c240 + c4 € Zn 7 
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12. 


13. 


14. 


15. 


16. 


17. 


18. 


(i) Encrypting plaintext "S E N D$7500" with public key (n4, e4)— (2047, 179). 
(ii) Factor n4 — 2047 to find the private key (n4, d4) —? 
(iii) A password attacker can quickly find the private key d4 without factoring 
2047, so n4 — 2047 is a pretty bad choice. Why? 


The computer attacks the public key (n4, ea) = (536813567, 3602561) and 
finds the private key d4. It shows that 29-bit n 4 is not safe in RSA system. 
Assuming that the plaintext alphabet is (0, 1,..., 26), and the first 26 num- 
bers are 26 letters in English, blank — 26. Cryptosystemtext alphabet adds 
^|" = 27 to the plaintext alphabet, a total of 28 numbers. If the plaintext unit 
ism = mim»ms € Z5. Cryptosystemtext unit is c = c1c5c3 € s Then in the 
corresponding number of Z,, (see exercise 11), we need n4 to meet 


19683 = 27? < n4 < 28° = 21952, 


(i) If your decryption key is (n4, da) = (21583, 20787), decrypt cryptosys- 
temtext is “Y SN AU OZH XX H" (blank at the end). 

(ii) If you know the Euler function g(n) = 21280, calculate e = d^! mod y(n) 
and factorize n. 


Prove: In RSA, the 35 bit integer n = 23360947609 is a particularly bad choice. 
(Hint: n = p -q factorization, the size difference between p and q remains 
unchanged, and Fermat factorization can be used to attack.) 

Let n be a square free number, and de = 1(mod g(n)). It is proved that there is 
congruence 


a! = a(mod n) 


for all integers a. 

The multiplication group Fig, of finite field F;g; is generated by g = 2, the 
discrete logarithm of 153 pairs of basis 2 is calculated by smoothing factor 
method. 

In the knapsack problem, determine whether the following sequence is an over 
increasing sequence, whether the knapsack problem is solvable for a given N, 
and how many solutions there are: 


(i) A = (2,3, 7, 20, 35, 69}, N = 45; 
(ii) A = (1,2, 5,9, 20, 49}, N = 73; 
(iii) A = (1,3, 7, 12, 22, 45}, N = 67; 
(iv) A = {2, 3,6, 11,21, 40}, N = 39; 
(v) A = {4, 5, 10, 30, 50, 101}, N = 186. 


If A = {a;|i = 0, 1, 2, --- J is an over increasing sequence and ao = 1, a; is the 
smallest positive integer satisfies a; > ee aj, then a; = 2! holds for Vi > 1. 
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19. Let A = (ao, a1, ..., aj, . . .} beasuper-increasing sequence, where a; = 2G > 
1), then for any positive integer N, Knapsack problem (A, N) has a unique 
solution. 

20. Let A = (ao, a1, ..., Ai, ...) be a super-increasing sequence, if for any positive 


integer N, knapsack problem (A, N) always has a solution, prove a; = 2/(i > 1). 


References 


Adelman, L. M., Rivest, R. L., & Shamir, A. (1978). A method for obtaining digital signatures and 
public-key crypto system. Communication of ACM, 21, 120-126. 

Adleman, L. M. (1979). A subexponential algorithm for the discrete logarithm problem with appli- 
cation to cryptography. In Proceedings of the 20th Annual Symposium on the Foundations of 
Computer Science, pp. 55—60. 

Blum, M. (2022). Coin-flipping by telephone—A protocol for solving impossible problems (pp. 133— 
137). Spring-Compcan: IEEE Proceeding. 

Coppersmith, D. (1984). Fast evaluation of logarithms in fields of characteristic two. IEEE Trans- 
actions in Information Theory, IT-30, 587-594. 

Cover, T. M. (2003). Fundamentals of information theory. Tsinghua University Press (in Chinese). 

Diffie, W., & Hellman, M. E. (1976). New direction in crytography. IEEE Transactions in Informa- 
tion Theory, IT-22, 644—654. 

EIGamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete logarithms. 
IEEE Transactions in Information Theory, IT,314, 469-472. 

Fait, A. & Shamir, A. (2022). How to prove yourself: Practical solutions to identifications and 
signature problems. In A advance in Crypology-CRYPTO 86 (Vol. 263, pp. 186-194). Springer- 
Verlag, LVCS. 

Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of 
NP-completeness. Freeman. 

Goldreich, O. (2001). Foundation of cryptography Cambridge University Press. 

Gordon, J. A. (1985). Strong prime are easy to find, advance in cryptology. In Proceedings of Euro 
Crypt84 (pp. 216—223). Springer. 

Hellman, M. E., & Merkle, R. C. (1978). Hiding information and signatures in trap door knapascks. 
IEEE Transactions in Information Theory, IT-24, 525—530 

Hellman, M. E. (1979). The mathematics of public-key cryptography. Scientific America, 241, 
146-157. 

Hill, L. S. (1931). Concerning certain linear transformation apparatus of cryptography. American 
Math Monthly, 38, 135-154. 

Kahn, D. (1967). The codebreakers, the story of secret writing. Macmillan. 

Knuth, D. E. (1973). The art of computer programming. Addision-Wesley. 

Koblitz, N. (1994). A course in number theory and cryptograph. Springer-Verlag. 

Kranakis, E. (1986). Primality and cryptogaphy. John Wiley-Sons. 

Massey, J. L. (1983). Logarithms in finite cyclic group-Cryptographic issues. In Proceedings of the 
4th Benelux Symposium on Information's Theory, pp. 17-25. 

Odlyzko, A. M. (1985). Discrete logarithms in finite fields and their cryptographic significance. In: 
Advance in Cryptology, Proceedings of Eurocrypt 84, pp. 224-314. Springer. 

Rivest, R. L. (1985). RSA chips(past, present, and future). Advances in Cryptology, Proceedings of 
Eurocrypt, 84, 159-165. 

Ruggiu, G. (1985). Cryptology and complexity theories, advances in cryptology. In Proceedings of 
Eurocrypt (Vol. 84, pp. 3-9), Springer 

Schneier, B. (1996). Applied cryptography, John Wiley 8-sous. 


196 4 Cryptosystem and Authentication System 


Shamir, A. (1982). A polynomial time algorithm for breaking the basic Markle-Hellman Cryptosys- 
tem. In Proceedings of the 23rd Annual Symposium on the Foundations of Computer Science, pp. 
145-152. 

Shannon, C. E. (1949). Communication theory of secrecy system. The Bell System Technical Jour- 
nal, 28, 656—715. 

Stinson, D. R. (2003). Principles and practice of cryptography, translated by Guodeng Feng. Elec- 
tronic Industry Press (in Chinese). 

Trappe, W., & Washington, L. C. (2008). Cryptography and coding theory, translated by Quanlong 
Wang et al., people's Posts and Telecommunications Publishing House (in Chinese). 

Wah, P., & Wang, M. Z. (1984). Realization and application of Massey-Omura lock. In Proceedings 
of the International, Zürich Seminar(1984),175-182. 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter's Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter's Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Chapter 5 A) 
Prime Test gag 


In the RSA algorithm in the previous chapter, we see that the decomposition of large 
prime factors constitutes the basis of RSA cryptosystem security. Theoretically, this 
security should not be questioned, because there is only the definition of prime in 
mathematics, and there is no general method to detect prime. The main purpose of 
this chapter is to introduce some basic prime test methods, including Fermat test, 
Euler test, Monte Carlo method, continued fraction method, etc., understanding the 
content of this chapter requires some special number theory knowledge. 


5.1 Fermat Test 


According to Fermat’s congruence theorem (commonly known as Fermat’s small 
theorem, which is a special case of Euler congruence theorem), if n is a prime 
number, the following congruence formula holds for all integers b, (b, n) = 1, 


b'-! = 1 (mod n). (5.1) 


The above formula is an important characteristic of prime numbers. Although n 
satisfying the above formula is not necessarily prime, it can be used as an important 
basis for detecting prime numbers, because we can conclude that n not satisfying the 
above formula is definitely not a prime number. Using Formula (5.1) as the standard 
to detect prime numbers is called Fermat test. 


Definition 5.1 An odd number n, assuming that n is a compound number (not a 
prime number) and there is a positive integer b, (b, n) — 1, satisfying 


b^-! = 1 (mod n), 
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the compound number n is called a Fermat pseudo prime under base b. 


The basic properties of pseudo prime numbers are discussed. Our working plat- 
form is a finite Abel group Z*, define as 


Z = {a|l<a<n,(a,n)=1},n>1, (5.2) 


where a is a congruence class of mod n represented by a. The multiplication of two 
congruence classes is defined as a - b = ab; obviously, Z* forms an Abel group of 
order g(n) under multiplication, in a finite group G, the order of a group element 
g € G is defined as 


o(g) = minm : g” 2 1,1 < m <|Gh}. 


o(g) = 1 if and only if g is the unit element of group G. By the definition of o(g), 
obviously, 
g =1 $ o(g)lt. (5.3) 


The following two lemmas are the basic conclusions about the order of group element 
g. 


Lemma 5.1 G is a finite group, g € G, k € Z is an integer, then 


o(g) 


ky _ 
o8) qe 


(5.4) 


where the denominator is the greatest common divisor of k and o(g). 


Proof Let o(g) = m, o(g*) = t, obviously, (g*)" = 1, in particular, 


km 
g m = ]t 


(k, m) 


On the other hand, by g" = 1, there is m|kt, thus 


m k m 
So we have t — Tm? the Lemma holds. 
Lemma 5.2 Suppose G is a finite Abel group, a, b € G, (o(a), o(b)) = 1, then 
o(ab) = o(a)o(b). 


Proof Let o(a) = mi, 0(b) = mz, then (m, m5) = 1. Let o(ab) = t, by (ab)""? = 
amm2 pmi» — | there is t|m m», onthe other hand, (ab)! = 1, then (ab)’"' = 1, thus 
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b'™! = 1, m2|m t, m2|t. By the same reason, there is m |t, thus mjmz|t, f = mm». 
The Lemma holds. 


Back to the finite group Z*, any integer a € Z, (a, n) = 1, thena € Z*, we denote 
o(a) with o(a), a is called the order mod n, obviously, o(a) = o(b), ifa = b(mod n). 
A basic problem in number theory is the existence of primitive roots of mod n. 
equivalently, is Z* a cyclic group? If there is a positive integer a, (a, n) = 1, o(a) = 
|Z*| = y(n), then Z; is a cyclic group of order y(7), so that the primitive root of 
mod n exists and a is the primitive root of mod n. 


Lemma 5.3 (Existence of primitive root) Jf and only if n = 2, 4, p“ (a > 1) and 
a = 2p" (a > 1) four cases, the primitive root of mod n exists, where p > 2 is an 
odd prime. 


Proof If n = 2, 4, then the lemma holds. If n = p, then Z, = Fp, Z} = E by 
Lemma 4.7 of Chap.4, it can be seen that F5 is a cyclic group of order (p — 1), 
so mod p has primitive roots. Now, we need to prove for all positive integer o, the 
primitive root of mod p* also exists. Therefore, let a be a primitive root of mod p, 
that is, the order of a mod p is p — 1. If the order of a mod p“ is denoted by o(a), 
then 

a^? = | (mod p*), => a"? = 1 (mod p), 


so there is p — 1|o(a). And the number of elements of Zr, is e(p*) = p*-!(p — 1), 
obviously, o(a)| p^! (p — 1), thus, o(a) = p(p—1),,0xixa- I. 

We might as well let o(a) = p — 1, if o(a) = p'(p — 1),1 < i, then replace a 
with a”. By Lemma 5.1, 


i i(p—1 
ola? ) = EAN (P ) ; 
(p', p'(p — 1) 

Therefore, without losing generality, let o(a) = p — 1, then by Sylow theorem, when 
a > 1, p*~“|p(p%), there is an integer b, (b, n) = 1, b is o(n) = p*-! in the order 
of mod p^, because of (o(a), o(b)) = 1, then by Lemma 5.2, there is 


o(ab) = o(a)o(b) = p*- (p — 1) = e(p^), 


So the primitive root of mod p^ exists. 
When n = 2p^, p > 2 is odd prime, then g(n) = o (p^). Thus, the primitive root 
a of mod p“ is also an primitive root of mod 2p“. The Lemma holds. 


Lemma 5.4 Let n be an odd compound number, then 


(i) b > lisa positive integer, (b, n) = 1, n is Fermat pseudo prime under base b if 
and only if o(b)|n — 1. 

(ii) n is Fermat pseudo prime under bases b, and bo, then it is Fermat pseudo prime 
under bases b, by and bib, where bj! is the multiplicative inverse of b; mod n. 
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(iii) If exist one b € Z; does not satisfy Eq. (5.1), at least half of a, b € Z* do not 
satisfy Eq. (5.1). 


Proof (i) and (ii) are trivial. (i) can be obtained by (5.3). And bj, b; € Z7, 


bi! = I(modn), b ! = (mod n). => (bib;)"! = 1(mod n). 
p"! = 1 (mod n), => (b !)"^! = 1(modn). 


So there is (ii). To prove (iii). Let n not be Fermat pseudo prime to base b, if n is 
Fermat pseudo prime to base a, then n is not Fermat pseudo prime to base ab. By 
(ii), therefore, if there is a base to make n a Fermat pseudo prime number, there must 
be a base to make n not a Fermat pseudo prime number, so more than half of the 
base b must make n not a Fermat pseudo prime number. The Lemma holds. 


By Lemma 5.3, if there is a base b so that n is not Fermat pseudo prime, detect a, 
1 <a <n, (a,n) = 1 in sequence, whether a”! = 1(mod n); that is, there is more 
than 50% chance that find the exact b such that D"! z 1(mod n), this proves that 
n is not a prime number. Is it possible that all a, 1 < a < n, (a, n) = 1, n is Fermat 
pseudo prime to base a The answer is yes, such a number n is called Carmichael 
number. 


Definition 5.2 A Carmichael number n is an odd compound number, and for V b € 
Zy , there is 
b^-! = 1(modn). 


For Carmichael number, we have the following engraving. 


Theorem 5.1 Let n be a compound number, then 


(i) If there is an integer a > 1, a?|n, then n is not a Carmichael number. 
(ii) Assuming that n is a square free number, then n is a Carmichael number <> for 
all prime p, p|n, there is p — l|n — 1. 
(ii) A Carmichael number is the product of at least three different prime numbers. 
Proof Let's prove (i) first. Let p?|n, p be a prime number, by Lemma 5.3, mod p? 
has primitive roots. Let g be an original root of mod p?, that is o(g) = p(p — 1), let 


n= I] p^, p’ isa prime number. 
p'n.p'zp 
According to the Chinese remainder theorem, there is a positive integer b such that 


b = g(mod p?), 
b = 1(mod n’). 


Then b is an primitive root of mod p?, and (b, n) = 1. We assert that n to base b is 
not a Fermat pseudo prime. If n to base b is a Fermat pseudo prime, then 
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b"! = 1(modn), => b"! = 1(mod p?), => o(b)|n — 1. 
That is p(p — 1)|n — 1, but p|n is contradict with p|n — 1. So b"^! ¥ 1(mod n), n 


is not Carmichael number, (1) holds. 
Now to prove (ii). If V p, p|n, there is p — 1|n — 1, then Y b € Z 


* 
n? 


p'-! = (b=)?! = 1 (mod p), V pln. 
Because n is a square free number, so 
b"! = I(modn), Vb e Zi. 
Therefore, n is the Carmichael number. Conversely, if there is a prime number p, 
pin, but p — 14n — 1, Let g be a primitive root of mod p, which is given by the 


Chinese remainder theorem, 


b = g(mod p), 
n 
b=1 (moa ) . 
p 


b?-| = g?^! = | (mod p). 


Then (b, n) = 1, and 


By p — 1 łn — 1, then g"^! z 1(mod p), so there is b”! z 1(mod n), this contra- 
dicts with the assumption that n is the Carmichael number. So (i1) holds. 

To prove (iii), we just need to exclude that n is the product of two prime numbers. 
By (ii), let n = pq, p < q, if n is a Carmichael number, then q — 1 | n — 1, but 
n—i-p(q—l-1)-—1-2p(q—1)-*p-—lthen 


n— l = p — l(modq - l), 
this contradicts with n — 1 = 0(mod q — 1), so n = pq must not be a Carmichael 
number, the Theorem holds. 


Below we give some examples of Carmichael numbers, from property (ii) in 
Theorem 5.1, we can easily verify whether a square free number is Carmichael 
number. 


Example 5.1 The following positive integers n are Carmichael numbers, 


n = 1105 = 5. 13 - 7, n = 1729 = 1-13. 19, n = 2465 = 5 . 17-29, 
n = 2821 = 7 - 13-31, n = 6601 = 7 -23 - 41. 


Example 5.2 The positive integer 561 = 3 - 11 - 17 is the smallest Carmichael num- 
ber. 
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Proof Defined by, the Carmichael number is odd and compound, so the minimum 
Carmichael number is 


n=3-p-q, where p— 1|n—1,q — ljn — 1, p < q isa prime. 
Let p = 5, p = 7, the congruence equation 
3.p.qz21(modq—l)hq»p 


has no prime solution q, when p — 11, the above formula has a minimum solution 
q = 17,son = 3- 11 - 17 is the smallest Carmichael number. 


Example 5.3 For given prime number r > 3, then the congruence equations 


rpq = \(mod p — 1) 
rpq = \(modgq — 1) 


has only finite different prime solutions p, q. Let's leave this conclusion for reflection. 


5.2 Euler Test 


Let p > 2 be an odd prime, Euler test uses the Euler criterion in the quadratic residue 
of mod p to detect whether a positive integer n is prime. Like Fermat’s test, it is 
obvious that the n that passes the test cannot be determined as prime, but the n that 
fails the test is certainly not prime. We know that when the positive integers a and n 
are given (n > 1), the solution of the quadratic congruence equation x? = a(mod n) 
is a famous “NP complete" problem. We can't find a general solution in an effective 
time. However, in the special case where n = p > 2 is an odd prime number, we 
have rich theoretical knowledge to discuss the quadratic residue of mod p, these 
knowledge include the famous Gauss quadratic reciprocal law and Euler criterion, 
which constitute the core knowledge system of elementary number theory. First, we 
introduce Legendre sign and let p > 2 be a given odd prime number. 

Z^ is a (p — 1)-order cyclic group, a € Z7 (i.e., (a, p) = 1), we define the Leg- 
endre symbolic function as 


(5) | 1, when xz a (mod p) is solvable 


— ], when xz a (mod p) is unsolvable 


If (a, p) > 1, thatis p | a, we let C) — 0, for V a € Z, Legendre symbolic function 
(x) is all defined, and it is a completely integral function of Z — (1, —1, 0}. 
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and 
a b\ . 
(<) = (5). if a = b (mod p). 
p p 


If (29 = 1, then x? = a(mod p)is solvable, a is called a quadratic residue of mod p, 
if G) = —1, then x? = a(mod p) is unsolvable, a is called a quadratic nonresidue 
of mod p. 

Lemma 5.5 a € Z, p 1a, then the necessary and sufficient condition for a to be the 
quadratic residue of mod p is 


p 


a'T = 1(mod p). 


Proof Zi, is a p — l-order cyclic group, let g be a primitive root of mod p, that is g 
is the generator of Z^», that is Va € Z, (a, p) = 1, we have 


a = g' (mod p), where 1 « t x p — I. 


Obviously, a is the quadratic residue of mod p + t is even. Therefore, if t is even, 
then 


p-l t(p-1) t 1 
aT =g 7 z(gi)" = (mod p). 
Conversely, if a = (mod p), then o(a) | eo , and by Lemma 5.1, can calculate 
p-1 
ola) = o(g') = ————. 


So 
p-1 
o(a) | =  2\(t, p — 1) € 2lļt, 


that is t is even, thus, a is a quadratic residue of mod p, the Lemma holds. 


Lemma 5.6 (Euler criterion). For V a € Z, we have 


- a 
a? = (5) (mod p). (5.5) 
P 
Proof If (a, p) > 1, that is p|a, the above formula holds. Might as well let p 1 a. 
By Fermat congruence theorem a^^! = 1(mod p), there is 


p-i 


(a 7. + 1)(a 7 — 1) = 0(mod p). 
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Thus 


a? = +l(mod p). 


p-i 


7 = —1(mod p), then (7) = 


Ifa = 1(mod p), by Lemma 5.5, then (7) = |.Ifa 
—1. So (5.5) holds. 


Definition 5.3 Suppose n is an odd compound number, if there is an integer 
b, (b, n) = 1, it satisfies 


bt = (2) (mod n), (5.6) 


Call n an Euler pseudo prime under base b. Where (2) is Jacobi symbol, define as 


b b ay b a2 b Os : " " 
= V dn .iín-pp-.epr. (5.7) 
n pi P2 Ps 


From the definition, we obviously have a corollary: if n is Euler pseudo prime under 
basis b, then n is Fermat pseudo prime under basis b. This conclusion can be proved 
by squaring both sides of Eq. (5.6) at the same time. 

The following example shows that the inverse of inference is not tenable; that is, 
if n is Fermat pseudo prime under basis b, but not Euler pseudo prime. 


Example 5.4 n = 91 is Fermat pseudo prime under basis b = 3, but not Euler pseudo 
prime. In fact, it's easy to calculate 3° = 1 (mod 91), thus 3° = 1(mod 91). From 
3° = 1(mod 91), we have 

3? = 1(mod 91), — 3? = 9(mod 91). 
So 91 to base 3 is not an Euler pseudo prime. 


Example 5.5 n = 91 to base b = 10 is an Euler pseudo prime. Because 
109 = 10? = —1(mod 91), 


calculate Legendre symbols 


(51) = (51) (ai) =~ 


so n = 91 to base b = 10 is an Euler pseudo prime. 


From the Euler criterion of Lemma 5.6, we can easily calculate the Legendre 
symbols of —1 and 2. 
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Lemma 5.7 Let p > 2 be an odd prime, then we have 


[S -(D^5, i5) =e, (5.8) 
P P 


Proof By Lemma 5.6, 
p- —1 
CD)? = (=) (mod p), 
P 


Since both sides of the congruence are +1, p > 2, there is (=) = eit, To 


calculate the Legendre sign for 2, we notice that 


p —1=(-1)'(mod p) 
2 22. (—1)) (mod p) 
p —323-(-1y (mod p) 


pol pi 
rl :CD* (mod p), 


Ed pcs EPFL : 
where r = 2, if ?7- isa even; r = p — *57, if 77 is an odd. There is 


2.4.6... (p pe (^54 1)5 09 (mod p), 


that is P a 
2T =(-1)?” -P (mod p 
by Lemma 5.6, 
2 2 
(5) = (-1)$ 07-9 (mod p), 
P 
there is 


© = (-)8P?, 
p 


Let (*) be a Jacobi symbol, defined by Eq. (5.6), then Lemma 5.7 can be extended 
to Jacobi symbol. 


Lemma 5.7 holds. 


Lemma 5.8 Let n be an odd, then we have 


& =(-)*, C) = (pe? (5.9) 
n n 


206 5 Prime Test 


Proof The square of any odd number is congruent 1 under mod 8, that is a? = 


1 (mod 8). Write n = a? - pı p2 +- pi, where p; are different prime numbers, then 


n = pip2::: pi(mod8). 


Similarly, for V n € Z, by (5.7), 


CIE 
E oe : (5.10) 
n Pi P2 Pt 
thus 
(S-(G)G) l IG =) =( jee ee pe. 6.11) 
n Pı P2 Pt 


The same can be proved (2), the Lemma holds. 


Corollary 5.1 For all odd numbers n, they are Euler pseudo prime under the base 
+1. 


Proof It is trivial that n to 1 is an Euler pseudo prime number, and n to —1 is an 
Euler pseudo prime number, which is directly derived from Lemma 5.8. 


Lemma 5.9 (Gauss. ) Let p and q be two different odd primes, then 


(4) (2) = (—1)F0-D@-D, 
P q 


Proof According to incomplete statistics, there are currently more than 270 methods 
to prove Gauss quadratic reciprocal law. In order to save space, we leave the proof 
to the readers, hoping that everyone can find their favorite proof method. 


Next, we discuss the computational complexity of Fermat test and Euler test. 


Lemma 5.10 Letn be an odd, 1 < b < n, (b, n) = 1, then 


Time(n to base b's Fermat test) = O (log? n), 
Time(n to base b’s Euler test) = O(log* n). 


Proof By (5.1), the Fermat test of n to base b is actually an operation of b"~! to 
mod n, by the Lemma 1.5 of Chap. 1, bit operations of b"^! mod n, 


Time(5"-! mod n) = O(logn log? n) = O(log? n). 


Euler test of n to base b, by (5.6), the number of bit operations on the left is O (log? n). 
Find Jacobi symbol (5), from Eq. (5.7) and quadratic reciprocal law, the calculation 
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can be transformed into the calculation of Legendre symbol. Each reciprocal law is 
actually a division, so we only consider the calculation of Legendre symbols. By 
Euler criterion, 


b p- 
Time (sucutue (*)) = Time (p^ mod p) = O(log’ n). 
p 
The number of prime factors of each n has an estimated O (log log n), so 
b 
Time (sucutue Jacobi symbol ()) = O(loglogn - log? n) = O log? n). 
n 


We have completed the calculation of Lemma 5.10. 


Solovay and Strassen proposed a probabilistic method to detect prime numbers by 
Euler test in 1977. When > 1 is an odd number, k numbers are randomly selected, 
bi, b2,..., by, where 1 < b; < n, (bi, n) = 1. Use Eq. (5.6) to calculate both sides 
of each b in turn, and the required bit operation is O (log* n), if both sides of Eq. (5.6) 
are not equal, then n is not a prime number and the test is terminated. If k b pass the 
Euler test of Eq. (5.6), then n is the probability < * of compound number, that is 


P {n is not prime} < 2935 


The above formula is directly derived from Lemma 5.3. Let's introduce a better 
Miller-Rabin method than Solovay-Strassen method in a sense. 


Definition 5.4 Let n be an odd compound number, write n — 1 = 2' - m, where 
t > 1,m is an odd. Let b € Z7, if n and b satisfy one of the following conditions, 


b" = 1(mod n), or exists one r, 0 < r < t, such that b?" =—I1(modn). (5.12) 


Then z is called a strong pseudo prime under base b. 


Lemma 5.11 Suppose n = 3(mod 4), then n is a strong Pseudoprime under base b 
if and only if n is an Euler Pseudoprime under base b. 


Proof Because n = 3(mod 4), then n — 1 = 2m, that ist = 1, m = in — 1). By 
Definition 5.4, n is a strong pseudo prime under base b if and only if 


n-l 


b" =b? = +1(modn). 


Therefore, if n is an Euler pseudo prime number under base b, the above formula 
holds, so it is also a strong pseudo prime number for base b. Conversely, if the 
above formula holds, because of n = 3(mod 4), then im — 1) is an odd number, so 

-1 
=) = -l,and 
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= = zb? (modn). 
n n n 


Therefore, to base b is Euler pseudo prime. The Lemma holds. 


Below we give the main results of this section. 


Theorem 5.2 Let n be an odd number, b € Z}, then 


(i) If n to base b is a strong pseudo prime, then n to base b is an Euler pseudo 
prime. 

(ü) Base b, which makes n a strong pseudo prime number, accounts for 2596 of 
] € b <n, (b, n) = lat most. 


Before proving Theorem 5.2, let's introduce Miller-Rabin's test method, in order 
to test whether a large odd number n is a prime number, we write n — 1 = 2! - m, 
m is an odd number, t > 1, select one b at random, 1 < b < n, (b, n) = 1. We first 
calculate b” mod n, if we get the result is +1, then n passes the strong pseudo prime 
test (5.12). If b” mod n A +1, then we square b” mod n and find the minimum 
nonnegative residue of the squared number under mod n to see if we get the result 
of —1 and perform r times. If we can’t get — 1, then n to base b fails to test Formula 
(5.12). Therefore, it is asserted that n to base b is not a strong pseudo prime number. 
If —1 is obtained by r squared, then n passes the test under base b. 

In Miller-Rabin's test, if n to base b fails to pass the test Formula (5.12), then n 
must not be a prime number, if n to randomly selected k b = (bi, b2,..., bg} pass 
the test, by property (ii) of 5.2, each b; accounts for no more than 25 


1 
P{n not prime} < ae (5.13) 


Compared with the Solovay—Strassen method using Euler test, the Miller-Rabin 
method using strong pseudo prime test is more powerful. 
To prove 5.2, we first prove the following two lemmas. 


Lemma 5.12 Let G = (g) be a finite group of order m, that is o(g) =m, then 
equation x* = 1 has exactly d solutions in G, d = (k, m). 


Proof x € G, write x = g', then x* = g" = 1 & m|kt, thatis *|5 - t, thus ^ |t, let 


t = 5-5, then when s = 1,2, ..., d, x = g has exactly d solutions. The Lemma 
holds. 


Lemma 5.13 Let p be an odd prime number, p — 1 = 2! m', t > 1, m' is prime, then 
x?" = —1(mod p), m is odd (5.14) 


The number of solutions N in Z}, satisfies 
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0, ifr >t; 


"2 (m,m^, ifr «t. 


Proof Let g be a generator of Z*, write x = g/, 1 < j < p — 1, because o(g) = 


p 
p-— 1, so 
p-l 


g ? = -—l(mod p). 


Thus 


l -1 
x?" = —1(mod p) & lmj = “(mod p =i) 


Namely, 
2' mj = O(mod p — 1). 


Because p — 1 = 2'm', the above formula is equivalent to 
2'mj = 2! ! m' (mod 2 m^). (5.15) 


If r > t — 1, then the congruence has no solution to j, because m and m’ are odd 
numbers, so when r > t, (5.14) is unsolvable. If r < t, let d = (m, m^), then 


Q"m, 2 m!) = 2' d, 


then Eq. (5.15) has exactly d solutions for j. Each j corresponds to one x = g/, then 
the number of solutions of Eq. (5.14) to x is N — 2'd, the Lemma holds. 


With the above preparation, we now give the proof of Theorem 5.2. 


Proof (The proof of Theorem 5.2). Let's first prove that (i), that is, n and b satisfy 
Eq. (5.12), we want to prove that formula (5.6) is satisfied; that is, if n to base b is 
a strong pseudo prime number, then n to base b is an Euler pseudo prime number, 
write n — 1 = 2'm, m is prime, we prove the property (i) of Theorem 5.2 in three 
cases. 


(1) b” = 1(mod n). In this case, it is obvious that bT = 1(modn). Let's prove 


(2) = 1, in fact, 
-(Q-6)- Q^ 
p p p 
set) 
b? z[-]|szl(modn). 
n 


That is n to base b is an Euler pseudo prime number. 
n-l 


(2) b? = —I(modn). In this case, we have to prove (3) — —], let p|n be any 
prime factor of n, write p — 1 = 2m, where t; > 1, m, is an odd number. 


There is 
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Let’s calculate the Legendre symbol C», in fact, tf; > t, and 


b —1, ift =f; 
—)= (5.16) 
p 1, ift t. 
Because 
bt =b" =—1(modn), = p? "" = —1(modn), 
by p|n, we have 
p^ mm = —] (mod p). (5.17) 


If 1 < t, from the above formula, there is 
p^": = —1 (mod p), => b?-! = —1 (mod p). 


This contradicts Fermat’s congruence theorem, so we always have t; > t. If 
tı = t, by (5.17), then 


(je 
P 


Because if the above formula is 1, both sides will be m power at the same time, 
which will contradict Formula (5.17). If t; > t, put both sides of Eq. (5.17) to 
the power of 2"'~‘ at the same time, then (4) = 1, so we have (5.16). 

We now complete the proof of case (2) under the conclusion of Eq. (5.16), write 
n=] pin P» P does not require different, define the positive integer k as 


=b” ™ = —1(mod p). 


k = #{p | pln, p — 1 — 2^ mi, m, is odd, tj = t}. 


O-n- e» 


Let's prove that k is an odd number, because f; > t, p — 1 = 2"m;,n — 1 = 2'm, 
under mod 2'+!, we have 


By (5.16), then 


| 1(mod 2/*!), ift; >t; 
p 


1+2'(mod2'*!), ifr =t. 
Because n = 1 + 2! (mod 2'*!), so 


n=14+2'=1+4+k-2'(mod2't!), 


5.2 


(3) 


a 


— 


(2) 
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So k must be odd, by (5.18), then (3) — —]. Case (2) is proved. 

p?" = —1(modn), where 1 < r < t,on—1 —2! m. 

In this case, we replace r of Eq.(5.12) with r — 1. Because r — 1 < t — 1, so 
n-l 


b= = 1(modn). To prove property (i) of Theorem 5.2, we have to prove (2) = 
1, as in case (2), we let p|n, write p — 1 = 2^ - mj, m, is odd, then we have 


ti >r,and 
b —], ift =fr; 
Lks . (5.19) 
p 1, Ift, >r. 


The proof of Formula (5.19) is the same as that of case (2), write n = |] p, p is 
not required to be a different prime, define positive integer kı: 


kı =#{p | pln, p — 1 = 2" mi, m; is odd, tj = r}. 


as in case (2), we have (3) -(-D^, similarly, under mod 2’*', it can be proved 
that kı must be even. Thus (2) = 1, we have completed all the proofs of property 
(i) in Theorem 5.2. 

Next, we prove property (ii) in Theorem 5.2. It is also discussed in three cases. 


n can be divided by a square number; that is, there is a prime number p, p? ||n, 
a> 2. 

In this case, we prove that there are at least iam —Db,beZ;,n to base 
b is not Fermat prime number, let alone a strong pseudo prime. First, suppose 
b"—! = 1 (mod n), then there is a prime p, p? |n, thus D"! = 1(mod p°). Because 
Z*, isa p(p — 1)-order cyclic group (see Theorem 5.3), let g be a generator of 
Z^ then 


* 0 2 (p-1) 
Zi = {8,8 KOE A }. 
By Lemma 5.12, the number of b satisfying b”~! = 1(mod p?) is d, 
d=(n-1, p(p-1)) =(M™-1, p—1). 


Because p|n, so płn — 1, and p 1 d; therefore, the maximum possibility of d 
is p — 1; therefore, the proportion of b in D"-! = 1(mod p?) in 1 < b < n shall 


not exceed 
p-1 1 1 


= < 7 
p?-1 p+174 


Therefore, there is at most b in the proportion of l.so that n to base b is Fermat 
prime, in case (1), we prove the property (ii) of Theorem 5.2. 

n — pq are two different prime numbers. 

In this case, let p — 1 = 2^ mj,q — 1 = 22? m», my, m» to be odd. Without losing 
generality, you can let t; < t2. Let b € Z7, in order for n to base b to be a strong 
pseudo prime number, it is necessary to satisfy 
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b" = 1(mod p), b" = 1(modq) (5.20) 


Or 
p?" = —1(mod p), D?" = —1(modq),0 <r <t. (5.21) 


By Lemma 5.12, the number of b satisfied (5.20) is < (m, mj)(m, m2) < mm». 
By Lemma 5.13, for each r, 0 € r < min(fi, t2) = tı, the number of b satisfy- 
ing 5?" = —| (mod n) is 2^ (m, m) - 2^ (m, m2) < 4"mım3. Because n = pq, 
then y(n) = (p — 1)(q — D, => n — 1 > (n) = 2^*^, therefore, the propor- 
tion of b of the strong pseudo prime of n to base b does not exceed 


4—1 
(5.22) 


mm» + mım + Amyma +--+» + 4°! ym aan (34. 4h —] 
2h*^mim» 


inl xb «n,(b,n) — 1. 
If t4 < t», then the above formula shall not exceed 


2 4h 2 
2725-1 = eas < 273 comm 
(5 t3 ) dieu 


1 


1 
6 4 

If t; = t», then m, Amp, so (m, mj) < m, and (m, m2) < m», one must be 
strictly less than. The reason is that if they are equal, then m;|m, m»|m, n — 
1 = žm, => n — l —2'm = pq — l = q — 1(modm,), thus mijn — 1, => 
mı|q — 1 = 2”m, = > mı |m, this is a contradiction. So (m, mı) < mı and 
(m, mz) < m» must have a strict less than 0. We have 


1 
(m, mı) : (m, m5) € gm. 


If m mz is substituted for immo in Eq. (5.22), the proportion of n to b whose 
base b is a strong pseudo prime number does not exceed 


Lei (eae 21 d 4 
= < à 
3 3 3/- 5 6 4 


We complete the proof of property (i1) of Theorem 5.2 in case (2). 

Finally, suppose n = pı p2- Pk, k => 3isthe product of different prime factors. 
In this case, write p; — 1 = 2 mj, m; as an odd number. As in case (2), with- 
out losing generality, it can make t; < t;(1 < j < k). Similarly to the proof of 
formula (5.22), the proportion of b satisfying that n is a strong pseudo prime 
number for base b does not exceed 
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iut ; 2krl _ 4 " 9k 9 2kt 
Qh tk [1 «2-4 
(«ms (G+) 


because k > 3, in this way, we have completed all the proofs of Theorem 5.2. 


Euler test and strong pseudo prime test require some complex quadratic residual 
techniques. We summarize the main conclusions of this section as follows: 


(A) n to base b is a strong pseudo prime number = n to base b is an Euler pseudo 
prime number => n to base b is a Fermat pseudo prime number; therefore, the 
strong pseudo prime test is the best way to detect prime numbers. 

Although no test can successfully detect a prime number at present, the probabil- 
ity detection method of strong pseudo prime number test, that is, Miller-Rabin 
method, can obtain that the success probability (see (5.13)) of detecting whether 
any odd number n is a prime number can be infinitely close to 1. That is 


(B 


— 


P (detect whether odd n is prime} > 1 — €, V € > O given. 


Moreover, the computational complexity of the detection algorithm is polyno- 
mial. 


5.3 Monte Carlo Method 


Using all the prime number test methods introduced in the previous two sections, for 
a huge odd number n, even if we already know that n is not a prime number, we cannot 
successfully decompose n, because the prime number test does not provide prime 
factor decomposition information, A more direct method—like the sieve method— 
verifies whether the prime factor of n is for prime numbers not greater than 4/71, 


because a compound number n must have a prime factor p, p < y/n. Selected p < 


vn 
logn 


A/n, the bit operation required to divide n by p is O (logn), there are O (27) prime 
numbers p < ./n in total, therefore, the bit operation required for such a verification 
is O (A/n). A more effective method was proposed by J. M. Pollard in 1975. We call 
it Monte Carlo method, or “rho” method. 

First, find a convenient mapping f of Z, E Zn; for example, f (x) is an integer 
coefficient polynomial, such as f (x) = x? + 1; secondly, a prime number x9 is ran- 
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domly generated, let xı = f (xo), xo = f(%1),-... xj = f(xj))(j =9, 1,2,---). 
In these xj, we want to find two integers x; and xg, which are different elements in 
Zn, but there are some factors d of n, d|n, and x; and x, are the same elements in 
Za, that is to say 

x; Æ Xy (mod n), (x; — xy, n) > 1. (5.23) 


Once x; and x; are found, the algorithm is said to be completed. 


Theorem 5.3 Let S be a set of r elements, let f : S — S is a mapping, xo € S, 
define xj41 = f(xj)(j = 0, 1,2, ...). Suppose X is a positive real number, let | = 
1 4- [2Ar], then the condition xo, x1, . .., xı is the ratio < e~* of the mapping f 
of elements in different s to the initial value xo, (f, xo), f in all mappings S and all 
Xo € S. 


Proof The total number of mappings f from f : S > S is r”, because each x € S, 
we can arrange r images for it, that is, f(x) has r choices. The initial value xo has r 
choices, so the total number of ( f, xo) is r’+!. The question is which of these ( f, xo) 
choices can satisfy the condition that xo, x1, .. . , xj is a different element in S. we 
want to prove that the proportion of (f, xo) satisfying the condition in r^*! (f, xo) 
is not greater than < e^. 

When xo € S given, there are r xo choices, then xy = f (xo) has only r — 1 choices 
and x? = f (xi) has only r — 2 choices, this goes on until x; = f(x;-1), there are 
only r — l options. The remaining x € S and f can be selected arbitrarily; that is, 
there are r^! choices. Therefore, when xo is given, there are N f to make (f, xo) 
meet the required conditions, where 


L 
N=r | je- 5. 
j=0 


Divide N by r’*!, and the proportion of ( f, xo) satisfying the condition is 


n = "Te -jz-z I (1 E ty. (5.24) 


j=l 


We notice that the real number x € (0, 1), then log(1 — x) < —x. Take the logarithm 
to the right of the above formula, then 


l 
j 2: D. P 
L ie(: i) < Li- a 


Because of l = 1 + [24r] > V2Ar, from the above formula, 
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l : 

J toe(1-4) < —À. 
r 
j=l 


By (5.24), we have 
N 


—A 
pr^ ` 


<e 


We complete the proof of Theorem 5.3. 


Monte Carlo method uses a polynomial f (x) € Z[x], so that n is a positive integer, 
and the congruence equation of mod n is invariant to polynomial f(x), that is 


a = b(modn), => f(a) = f (b)(mod n). (5.25) 


xo € Zn given, xj41 = f(xj)(j = 0, 1,...), if you find an xx, € Z, that satisfies 
Xk, = xj (mod r), where r|n, r > 1, ko > jo. By (5.25), 


f Gg) = f (X jy) (mod r), —» XK 41 = Xjy+1 (mod r). 


Thus for any k > j, if k — j = ko — jo, there is x, = x;(mod r), this proves that 


a polynomial mapping Zn dE. Zn produces ko different residue classes under 
mod r (r |n), 
[xo, X1... Xig-i]- 


Therefore, there is the following Lemma 5.14. 


Lemma 5.14 f(x) € Z[x] isa polynomial, n > 1 is an positive integer, let xo € Zn, 
xj; = f(xj i) = 1,2,...), ifk is the first subscript, there isa j, 0 € j < k, such 
that 

(xy — xj, n) 2 r - l. 


Then (xo, x1, ..., xx) is k different residual classes under mod r, so it is also k 
different residual classes under mod n. Moreover, Monte Carlo calculation defined 
by f can only produce k different residual classes. 


We call the polynomial f and the initial value x9 described in Lemma 5.14 an 
average mapping. When the first subscript k is very large, the amount of calculation 
is very large. Here we give an improved Monte Carlo algorithm. 

f(x) € Z[x] given, Monte Carlo algorithm needs to continuously calculate 
x(k = 1,2,...). Let 2^ < k < 2 (h > 0), j = 2^ — 1; that is, k is an (h + 1)- 
bit number, j is the maximum h-bit number, compare x, with x; and calculate 
(xy — xj, n), if (xy — xj, n) > 1, then the calculation is terminated, otherwise con- 
siderk + 1. The improved Monte Carlo algorithm only needs to calculate (x, — xj, n) 
once for each k , j = 2" — 1. There is no need to verify every j, 0 < j < k, when k 
is very large, it reduces a lot of computation, but there is a disadvantage. It may miss 
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the smallest subscript k satisfying the condition, but the error is controllable. In fact, 
we have the following error estimation. 


Lemma 5.15 f(x) € Z[x], n > 1 given, xo € Zn, xj = f(xj-1) VG = 1,2,...), let 
ko be the smallest subscript and satisfy (xy, — xj, n) > 1, whereO < jo < ko, assum- 
ing that k is the smallest positive integer satisfying (xy — xj, n) > 1 in the improved 
Monte Carlo algorithm, we have k < 4ko. 


Proof Suppose ky has (h + 1) bits. Let j = 2.1.k- j + (ko — jo). By Lemma 
5.14, then 
(Xy, — xj, n) > 1, 5 OK — xj, n) > L 


Obviously, j is the maximum number of (h + 1) bits and k is the number of (h + 2) 
bits, so k is the required subscript calculated by the improved Monte Carlo algorithm. 
Obviously, 


k= j + (ko — jo) < 27 — 1-4 2^ « 4.25 < 4ko. 
Lemma 5.15 holds. 
Example 5.6 Let n 291, f(x) 2 x? -1, xy = 1. By Monte Carlo algorithm, 


then x, = 2, x2 = 5, x4 = 26 and x4 = 40 (because26? + 1 = 40(mod 91)). By the 
improved Monte Carlo algorithm, only (x4 — x3, 91) needs to be detected to obtain 


(x4 — x3, 91) = (14,91) = 7. 


Lemma 5.16 Let n be an odd number and a compound number, and r be a factor 
of n, r|n, 1 « r < Jn. Let f(x) € Z[x], xo € Z, given, then the computational 
complexity of finding r by Monte Carlo algorithm ( f, xo) is 


Time(( f, xo)) = O(n log? n) bits. (5.26) 


Further, there is a normal number C, so that for any positive real number i, the 
success probability of Monte Carlo algorithm ( f, xo) to find a nontrivial factor r of 
n is greater than | — e^, that is 


P{(f, xo)find out rin, r > 1] 2 1 — e^. (5.27) 


The number of bit calculation operations required by the algorithm that depends on 
parameter À (to ensure the success rate of the algorithm) is O (V A4/n log? n). 


Proof From the discussion of computational complexity in Chap. 1, finding the max- 
imum common divisor of two integers and the addition, subtraction, multiplication 
and division in mod are polynomial. Let C, satisfies 


Time((y — z, n)) € Ci log? n, where y, z € n. 
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C» satisfies 
Time( f (x) modn) < Cy log’ n, x € Zn. 


If ko is (f, xo), the first subscript in the calculation satisfies (xx, — xj,,n) > 1, by 
the improved Monte Carlo algorithm, we have (x, — xj, n) > 1, where j = oF — 1. 
2^ < k «2h. By Lemma 5.15, k < 4k. Thus 


Time(found by (f, xo) k) < 4ko(C log? n + C; log? n). (5.28) 
Let (xy, — Xp n) =r > lr < Jn, by Lemma 5.14, ko < r, so 
Time(find r, r|n, r < vn) < 4/n(C, log’ n, C; log? n). 


Equation (5.26) proved. In the sense of probability, that is, on the premise of allowing 
certain errors, Eq. (5.26) can be further improved. 

Let à > 0 be any given real number, by Lemma 5.3, ratio of kọ > 1+ Vr 
< e^, in other words, the probability of successfully finding r, r|n, r < /n is 


P{find out r, r|n,r < /n| 2 1—e^. 


In order to ensure the success rate, then ko < 1 + /2Ar. By (5.28), the number of 
bit operations required shall not be greater than 


4(1 + V2Ar) (C1 log? n + Cy log? n) 2 O(/ A4/n log? n). 


We have completed the proof of Lemma. 


Remark 5.1 A basic assumption of Monte Carlo method is that the integer coefficient 
polynomial f can be used as an average mapping (see Lemma 5.14); this has not 
yet been proved. 


5.4 Fermat Decomposition and Factor Basis Method 


Lemma 5.17 Suppose n is an odd number, there is a 1-1 correspondence between 
factorization n = a - b(a > b > 0) of n and expression n = t? — s? (t and s are 
nonnegative integers) of n. The corresponding o : (a, b) — (t, s) can be written as 
o ((a, b)) = (t, s), where 


aab) = (527 T) 


2.7 2 


Inverse mapping is 
o !((t,s)) = (t4- s, t — s). 
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Proof lf n = ab, because both a and b are odd, then n = (4)? — (5°), so define 


aab) = (52^ T) 


24 5 2 


Conversely, if n = t? — s?, then n = (t + s)(t — s). So define o^! ((t, s)) = (t + 
5,f — s), we prove o la 21,00 !- |. By the definition, 


a t (d yer (5 eue -) = (a, b), 


20m 
olo i(t, s) 2o(t4-s,t — s) = (t, s). 


So ø isa 1-1 correspondence between the two decomposition n = ab = t? — s?, the 
Lemma holds. 


The above simple lemma provides us with a method of factor decomposition, 
called Fermat factor decomposition: if n = ab, a is very close to b, thenn = (ae y+ 
(E )? = £? — s?, where s is very small and is only alittle larger than y/n. Therefore, 
starting from t = [Vn] + 1, we successively detect whether t£? — n is a complete 
square number. If not, we change it to t = [Vn] + 2 for detection. In this way, until 
t? — n = s?°, we get n = (t + s)(t — s) through Fermat factorization. This method 
is effective when n = ab, a and b are very close. 

Fermat factor decomposition can be further expanded into a factor-based method 
to become a more effective factor decomposition method. Its basic idea is: in Fermat 
factorization, t? — n? is required to be a complete square, which is difficult to appear 
in practice, but £? = s?(mod n), t # +s(modn) is easy to appear. Calculate the 
maximum common divisor (t + s, n) and (t — s, n), then we have factorization 


n — (t --F s,n)(t — s,n). 


Definition 5.5 Let B be h different primes (maybe pı = —1), B is called a factor 
base. An integer b is called a B-number, if the minimum nonnegative residue of b? 
under mod n can be expressed as the product of prime numbers in B, where n is the 
given positive integer. 


Example 5.7 Let n = 4633, B = {—1, 2, 3}, then 67, 68, 69 are all B-number, 
because 67? = —144(mod 4633), 68? = —9(mod 4633), 69? = 128(mod 4633). 


If b is a B-number, b? mod n represents the minimum nonnegative residue of b? 
under mod n, by the definition, 


h 
b? modn — | [př o > 0. 
i=l 


Let e = {e), @,..., en} € F} be an h-dimensional binary vector, define 
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0, ifa is even; 


; l<j<h. 
1, ifo is odd. 


ej — 


eis called the binary vector corresponding to b if {b;} = A is a set of B-numbers. The 
binary vector corresponding to each b; is denoted as e; = {é;,, €;,,..., ej,], denote 
b? mod n with a;. We have 


h h 
i Qi Qij 
j=l 


icA j=l 

Suppose pem ei = (0,0,..., 0) is the zero vector in F^, then 
X aj = O(mod2), V1 < j <A. 
icA 

That is, [| a; is a square number. Let r; = i ies Qij, then 


2 
h 


h 
I] a; = I] pj , define c = I] p;. (5.29) 
j=l 


ieA j=l 


On the other hand, b; mod n represents the minimum nonnegative residue of b; under 
mod n, let 


b =| [@modn) =| [5.. (5.30) 


icA icA 
where 6; = b; mod n, that is 0 < ô; < n, and b; = ô; (mod n), thus 
| [» = »(mod n). 
icA 
Because of a; — b? mod n, that is 0 < a; « n, and b? = a;(modn). There is 
[J [4 =2 =| [a = ° moan). 
icA icA 


Two different integers b and c defined by Eqs.(5.29) and (5.30) satisfy b? = 
c? (mod n), We write the above analysis as the following lemma. 


Lemma 5.18 Let A = (bi, b2, ..., bi, ...) be a finite set of some B-numbers, let 
ei = (e, ĉis- -s Ei) € F} be the binary vector corresponding to bi, a; = b? modn, 
6; = b; modn. If X ica ei = 0 is the zero vector in Fi, then |];<4 ai is a square 
number. Write 
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ai - [Do T]a= " SPEI 


where 


TI 5 

15S a 

C Di icA ES 
j=l 


Further let b = 8185 - - -, we have D? = c?(modn). 


From the above lemma, if b? = c?(mod n), b Æ +c(modn). Then we will find 
a nontrivial factor d = (b + c, n) of n. Now the question is, if b? = c?(mod n), 
how likely is b  +c(modn)? Might as well make (b, n) = (c, n) = 1, otherwise 
both sides are divided by (b, n): by b? = c? (mod n), => (bc)? = 1(mod n). The 
problem is transformed into how many solutions x are in x? = 1(mod n), 1 < x <n. 


Lemma 5.19 Letn beanodd number, then the number of solutions of x? = 1(mod n) 
is 2", where r is the number of different prime factors of n. 


Proof If r = 1, then n = p*(a > 1), p is an odd prime, now x? = 1(mod p^) has 
two solutions x = +1, because let g be the original root of mod p^, then x = g'(1 < 
t < p*!(p —1)), x? = 1 & p*-!(p — 1)|2t. So there are only two solutions t = 
ip* (p — 1) and t = p*^! (p — 1). So x = +1(mod p^). If n = pj! --- p% , then 
the number of solutions of x? = 1(mod n) deduced from the Chinese remainder 
theorem is 2". The Lemma holds! 


Lemma 5.20 n is an odd number and is the product of the power of more than 


two different primes, B = (pi. po, .... Pn} is a factor base. Randomly select two 
1 


B-numbers b and c, then b? = c? (mod n), => b = cc(modn)'s rate is < 2: 


Proof x? = 1(mod n) has 2” different solutions (mod n), r > 2. The two solutions 


corresponding to x = € 1(mod n) correspond to b = -Ec(mod n). Thus 
2 2 , 2 1 
b^ = c (mod n), => b = +c(modn)’s rate < F < 7 


Lemma 5.20 holds. 


According to Lemma 5.20, b and c are selected by using factor basis, if b = 
+c(mod n), then select failure, and the probability of failure is < i If the selection 
fails, select another bı and c;, in this way, we randomly select k b and c equally 
almost independently, and the probability of success of b Æ +c(mod n) is 


1 
P{b? = c*(modn), b # :c(modn)) > 1 — TA (5.31) 
In other words, the probability of finding a nontrivial factor d = (b + c, n) of n by 
using the factor base can be infinitely close to 1. Below, we systematically summarize 
the factor base decomposition method as follows: 
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Factor-based method 
Let n be a large odd number and y be an appropriately selected integer (e.g., 
y < n), let the factor base be 


B = {—1, p | pis prime, p < y}. 


Select a certain number of B-number at random, A, = (bi, bo,..., by), usually 
N x rz (y) +2 will meet the needs. Each b; is expressed as the product of prime 
numbers in B. Calculate the corresponding binary vector e;, select a subset A C A, 
in Aj, such that b» eA €i = O, b; corresponding to binary vector e;, denote as A = 
(bi, bo,...,b;,...}. Let 


b= | [v mod n) = I] Si, where ô; = b; mod n 
icA icA 


and 


c= lI» modn, rj = 377 


jeB icA 


We have b? = c?(mod n), if b = +c(mod n), then reselect the subset A, Until finally 
b Æ -x:c(mod n), in this way, we find a nontrivial factor d|n of n, d = (b+ c, n). 
Therefore, there is factorization n = d - 7 

Factor decomposition using factor-based method cannot guarantee the success 
rate of 100% because b Æ -Ec(mod n) cannot be deduced from b? = c? (mod n), 
however, the success probability of factorization for large odd n can be infinitely 
close to 1. Under the condition of success probability > 1 — 3 (k is a given normal 
number), the computational complexity of factorization n of by factor-based method 
can be estimated as 


Time(factor-based method to n factorization) = O (e^v'^£"loglogzn). (5.32) 


The proof of Formula (5.32) is relatively complex. No detailed proof is given here. 
Interested readers can refer to pages 136-141 of (Pomerance, 19822) in reference 5. 
The exact value of C in (5.32) is unknown. It is generally guessed that C = 1 + €, 
where £ > 0 is any small positive real number. 

Let k be the number of bits of n, and the estimate on the right of (5.32) can be 
written as O (e^v 2"), Therefore, the computational complexity of the factor-based 
method is sub-exponential. Compared with the Monte Carlo method introduced in the 
previous section (see (5.31)), its computational complexity is exponential, because 


1 
O(J/n) = O(e^^), where cy = 5 108 2. 


As we all know, the security of RSA public key cryptography is based on the 
prime factorization n = pq of n. Although there is no general method to factor- 
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ize any large odd n, although Monte Carlo method and factor-based method are 
probability calculation methods, the probability of successful factorization is very 
large, The disadvantage is that their computational complexity is exponential and 
sub exponential, which is the reason for choosing huge prime numbers p and q in 
RSA. 


5.5 Continued Fraction Method 


In the factor-based method introduced in the previous section, b? mod n can be the 
residual of the minimum absolute value of b? under mod n, that is 


2_ 22 2 "n 
b^ = b* mod n(modn), |b^ modn| < PE 


In this way, b? mod n can be decomposed into the product of some smaller prime 
numbers. The continued fraction method is the best method at present. How to find 
the integer b, so that |b? mod n| < 2,/n, b? mod n is more likely to be decomposed 
into the product of some small prime numbers. First, we introduce what is continued 
fraction and some basic properties. 

Suppose x € R is areal number, [x] is the integer part of x, and {x} is the decimal 
part of x. Let aọ = [x], if (x) Æ 0, and let a; = [5], because of x = [x] + {x}, there 
is 


1 1 
ERE ua i aoe nen 


If {{x}~!} Æ 0, write 
ay = Ux)! 1 


consider 


umm 


So we got 


1 
x = ao + 
aj + 


at oe 

The above formula is called the continued fraction expansion of real number x. To 
save space, write x = [ao, a1, ..., An, ...], if and only if x is a rational number, the 
continued fraction of x is expanded to be finite, denote as 


x = [9,41], ..., an], where a, > 1. 


It is called the standard expansion of rational number x. 
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Definition 5.6 x = [ao, a1, ..., an, ..-] is the continued fraction expansion of x, for 
i > 0, call b = [ao a1, ..., aj] the ith asymptotic fraction of x, specially, 


bo _ a bi _ aiao + 1 
Co 1 , C1 a, i 


The progressive fraction bi of the real number x is a reduced fraction, that is 


(bi, ci) = 1, and has the following properties. 


Lemma 5.21 x = [do, a1, ..., an, -+ - ] is the continued fraction expansion of x, bi 
is the asymptotic fraction, then 
(i) wheni > 2, 
bi ibi— bj. 
c ERE (5.33) 
Cj — dijCij-| + Ci-2 
(ii) Ifi > 1, then 
bici-1 — bic; = (71). (5.34) 


Proof We prove that (i) by induction. Obviously, the proposition of i = 2 holds, that 
is 
bz u abı + bo a5(a1ag + 1) + ao 


C2 425€, + Co aa + 1 
If the proposition holds for i, that is 


bi ajbj-1 + bi-» 


Ci diCi-| + Ci-2 


í 1 
Then write [a0, a1, -<-s is isi] = [ao, a1, eo, di + aah 


1 
B (a + is) bi-ı +bi-2 anabi +i 


* i P s: " y 
Ci+1 (a + +) Ci-1 + Ci2 ai+1Ci + Ci-1 


Gi+1 


So (1) holds. 
We prove Formula (5.34) by induction, when i — 1, 


bico — boc; = aao + 1 — ajag = 1 = (-1)°. 
So when i — 1, the proposition holds, and when i, the proposition holds, that is 


bici-1 — bici = (71). 
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Then 
biir — bici = (aab; + bi-i)ci — biliyi + Ci-1) 
= bj.1cj — bici 
=(-1). 


Lemma 5.21 holds. 


Continued fractions have many important applications in numbers, such as rational 
approximation of real numbers and rational approximation of algebraic numbers. 
Periodic continued fractions are an important special case in rational approximation 
of algebraic numbers. x = [do, a1, ..., an, . . -]. If these a; occur in cycles of a certain 
length, they are called periodic continued fractions. The famous Lagrange theorem 
shows that the necessary and sufficient condition for the expansion of the continued 
fraction of x into a periodic continued fraction is that x is a quadratic real algebraic 
number. Here we do not discuss some profound properties of continued fractions, 
but only prove some properties we need. 


Lemma 5.22 Let x > 1 be a real number, 5 (i > 0) is the asymptotic fraction of x, 
then 
Ib? — x?c| < 2x, V i 0. 


: : bi bii T 
Proof Because x is between progressive scores em and re by property (ii) of Lemma 
5.21, there 1s 


bia. bi ] , 
= ,1>0 
Ci+1 Ci CiCi+1 
Thus " : 
Ib;? — x?c? =e x- 2| lx+ 2 
Ci Ci 
a) 
<E P O a s 
CiCi+1 CiCi+1 
So 
2 2.2 Ci 1 
|bi^ — x^cj| — 2x < 2x 1+ + 7 
Ci4] 2xci, 
Ci 1 
«2x ( 1+ ) 
Cil Cil 
< 2x (^ fe ca) E 
Ci+1 
The Lemma holds. 


Lemma 5.23 Let n be a positive integer and n not a complete square. Let {iso 


be the asymptotic fraction of the continued fraction expansion of ./n, and b? modn 
be the residue of the minimum absolute value of b? under mod n, then we have 
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b? modn < 24/n, V i > 0. 
Proof By Lemma 5.22, let x = y/n, then 
b? = b? — nc? (mod n). 


Because 
|b? — nc?| < 24/n, => b? modn < 24/n, V i > 0. 


The Lemma holds. 


Combining the above Lemma 5.23 with the factorization method, we obtain the 
continued fraction decomposition method. 

Continued fraction decomposition method: 

The operations of mod n involved in this algorithm, except that it is specially 
pointed out, are the minimum nonnegative residue of mod n. If n is a large odd 
number, it is also a compound number, first let b_; = b, bo = ag = [Vn], and xo = 
Vn — a = {y/n}, calculate b2 mod n, in fact, bj mod n = b — n. Second, consider 
i = 1,2,.... To determine b;, we proceed in several steps: 


1, Leta; = [=], and x; = a ae > 1). 

2. Let b; = a;b;_; + bj», the minimum nonnegative residual b; mod n of b; under 
mod n is still recorded as b;. 

3. calculate b? mod n. 


By Lemma 5.23, b? modn < 24/n, it can be decomposed into the product of some 
small prime numbers. If a prime number p appears in the decomposition of two or 
more b? mod n, or in the decomposition of an b? mod n, p appears to an even power, 
p is called a standard prime number, in other words, a standard prime p is 


2 2 E 
p|b; modn, p|b; modn,i Æ j. 


p“ \|b? modn, a is even. 
We choose factor base B as 
B = (—1, standard prime}. 


In this way, all b? mod n are B-numbers, and the corresponding binary vector is e;. 
Select a subset A = {bj}, => » /;., ei = 0. Let 


b= | [v mod n) = ILE 


icA icA 


i. 
and c — [lies Pje where 
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1 , 
rj = 72u] € B. 


If b zi +c(modn), then (b+ c, n) is a nontrivial factor of n, and we obtain the 
factorization of n. If b = +c(mod n), then another subset A is selected and repeated 
to complete the continued fraction factorization method. 


Example 5.8 The continued fraction method is used to factorize n = 9073. 


Solution: We calculate a;, b; and b? mod n in turn, where b; = (aibi—1 + bj 2) mod n, 
the table is as follows: 


From the value of b? mod n, we can choose the factor base B as B = {—1, 2, 3, 7}. 
Then b? mod n is the number of B-number, when i = 0, 2, 4, . . .. The corresponding 
binary vector is 


eo = (1, 4, 1, 0), e? = (1, 0, 0, 1), and eg = (1, 0, 3, 0). 
Easy to calculate eo + e4 = (0, 0, 0, 0). Therefore, we choose 


b = 95 - 2619 = 3834 mod 9073; 
C= 2753? = 36, 


Because b? = c?(mod 9073), that is 3834? = 367(mod 9073), but 3834 Æ +36 
(mod 9073), so we get a nontrivial factor of n = 9073, d = (3834 + 36, 9073) = 43. 
Thus 9073 = 43 - 211, the factorization of 9073 is obtained. 

Exercise 5 


1. p is a prime, if and only if b?-! = 1(mod p?), p? to base b is a Fermat pseudo 
prime. 

2. What is the minimum pseudo prime number with Fermat pseudo prime for base 
5? What is the minimum Fermat pseudo prime number for base 2? 

3. n= pq, p £ q are two primes, letd = (p — 1, q — 1), itis proved that n to base 
b is Fermat pseudo prime number, if and only if bf = 1(mod n), and calculate 
the number of bases b. 

4. If b € Z5, nto base b is Fermat pseudo prime, then n to base —b and b are Fermat 
pseudo prime numbers. 

5. If n to base 2 is Fermat pseudo prime, then N = 2" — 1 is also Fermat pseudo 
prime. 

6. If n to base b is Fermat pseudo prime, and (b — 1, n) = 1, then N = mol to 
base b is also Fermat pseudo prime. 


5.5 Continued Fraction Method 227 


7. Prove that the following integers are Carmichael numbers: 
1105 = 5- 13 - 17, 1729 = 7 - 13-19, 2465 = 5-17-29, 2821 =7- 13-31, 
6601 = 7-23-41, 29,341 = 13-37-61, 172,081 = 7 - 13 - 31. 61, 278,545 = 5. 17 - 29 . 113. 


8. Find all Carmichael numbers of form 3 pq and all Carmichael numbers of form 
5pq. 
9. Prove that 561 is the minimum Carmichael number. 
10. If n to base 2 is a Fermat pseudo prime, prove N — 2" — 1 is a strong pseudo 
prime. 
11. There are infinite Euler pseudo primes and strong pseudo primes for base 2. 
12. If n to base b is a strong pseudo prime, then n to base b* is also a strong pseudo 
prime for any integer k. 
13. The Fermat factorization method is used to decompose the positive integer as 
follows: 


n = 8633, n = 809,009, n = 92,296,873, n = 88,169,891. 


14. The Fermat factorization method is used to decompose the positive integer as 
follows: 


n — 68,987, n — 29,895,581, n — 19,578,079, n — 17,018,759. 


15. Expand the rational number x — S, X= 5, x = 1.13 into continued fractions. 
16. Let a be a positive integer, x = [a,a,a,---], calculate x =? 
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Chapter 6 A) 
Elliptic Curve ES 


In 1985, mathematician v. Miller introduced elliptic curve into cryptography for the 
first time. In 1987, mathematician N. Koblitz further improved and perfected Miller's 
work and formed the famous elliptic curve public key cryptosystem. Elliptic curve 
public key cryptosystem, RSA public key cryptosystem and ElGamal public key 
cryptosystem based on discrete logarithm are recognized as the three major public 
key cryptosystems, which occupy the most prominent position in modern cryptogra- 
phy. Compared with RSA cryptography, elliptic curve cryptography can provide the 
same or higher level of security with a shorter key; compared with ElGamal cryp- 
tosystem, they are based on the same mathematical principle and are essentially based 
on discrete logarithm cryptosystem. ElGamal cryptosystem is based on the discrete 
logarithm of multiplication group over finite field, and elliptic curve cryptosystem is 
based on the discrete logarithm of Mordell group of elliptic curve over finite field, but 
choosing elliptic curve has more flexibility than choosing finite field, so elliptic curve 
cryptosystem has attracted more attention This paper systematically and comprehen- 
sively introduces elliptic curve cryptography from the three aspects of cryptography 
mechanism and factorization, in order to make readers better understand and master 
this public key cryptography mechanism. 


6.1 Basic Theory 


The working platform of this chapter is a field E, especially E = R(real number 
field), E = C(complex field), E = Q(rational number field) or E = FF, (Finite field 
of q elements) four common fields. The characteristic x (E) of a field E is the order 
of the multiplicative unit element e of E in the additive group. That is, x (E) = o(e) 
is a prime number or co, specifically, 
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roe oo, EE = C, R, Q, 
p, if E =F;,4 = p. 
Definition 6.1 (i) Suppose E is a field, the character of E x (E) Æ 2,3, f(x) = 
x? Fax +b € E[x] is a cubic polynomial and has no multiple roots in the split 
field. An elliptic curve in field E refers to the set of finite points (x, y) € E? plus 
infinity on the “plane,” where the finite point (x, y) satisfies 


y? 2 x? -F ax +b, wherea € E, b € E given. 
Cz represents the elliptic curve, and “O” represents the infinity point, i.e., 
Cg = {(x, y) e E^|y? = x? c ax + b} U {0}. (6.1) 


(ii) If x (E) = 2, then an elliptic curve Cg on the field E with the characteristic 
of 2 is defined as 


Cg = (x, y) € Ely + y = x? c ax + b} U {0}. (6.2) 


(iii) If x (E) = 3, x? + ax? + bx +c € E[x] has no multiple roots in the split 
field, then an elliptic curve Cg on E is defined as 


Cg = (x, y) e E*|y? = x? c ax? + bx +c} U (0). (6.3) 


Let F(x, y) € E[x, y] be a bivariate polynomial, then F(x, y) = 0 defines an 
algebraic curve C on E. (xo, yo) € C is called a nonsingular point on C, if at least 
one of the partial derivatives 5 and 77 at (xo, yo) isnot 0. If x (E) # 2, 3, let f(x) = 
x? + bx + c, then the finite points of an elliptic curve F(x, y) = y? — f(x) = 0on E 
are nonsingular points, which is the same as that in x (E) = 2, x (E) = 3. Therefore, 
an elliptic curve is also called a nonsingular cubic curve. 

Among many profound arithmetic properties of elliptic curves, Mordell group 
on elliptic curves is the most beautiful and important basic property. Firstly, we 
introduce Mordell group when E = R is familiar with real number field and then 
extend it to finite field. 

Elliptic curve over real number field 


Definition 6.2 Let E = R be real number field, Cg is an elliptic curve, P and Q 
are two points on Cg, that is P € Cg, Q € Cz, we define addition according to the 
following rules: 

(1) If P = O is infinity, define that P + P = O is still infinity; that is, infinity is 
the unit element of addition, and the negative element of P is - P = P= O. 

(2) If P = (x, y) € Cg isa finite point. Define — P = (x, —y), obviously, — P € 
Cz, is the specular reflection point of point P on the xy—plane. 

(3) If P, Q € Cg are two finite points, they have different x-coordinates (1.e., P = 
(x1, ¥1), Q = (x2, y2), x1 Æ x2), then there is exactly a point R on the connecting 
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line between P and Q on the xy— plane, which is the intersection of the connecting 
line and the elliptic curve, define P -- Q — —R, is the specular reflection point of 
R. If Q is infinity. Then define P + O = P. 

(4) If Q = —P,that is, P and Q have the same x-coordinate, and P + Q = O is 
defined as infinity. 

(5) If P = Q isa finite point on Cz. Then the tangent of Cg at P has exactly an 
intersection R with Cg, define P+ P = —R. 


We use the geometric construction method to define the addition on elliptic curve 
Cz, for the connection of finite points with different x-coordinates and why the 
tangent at the finite point has only a unique intersection with Cg, it needs strict 
mathematical proof. We attribute it to the following lemma. 


Lemma 6.1 Let P = (xi, y1), Q = (x5, y2) be two finite points on elliptic curve 
Cz, and x; Æ x», then 

(i) The line between P and Q has only a unique intersection R = (x3, y3) with 
Cz, satisfies R ZZ P, R x Q, where 


X2—X1 m. (6.4) 
ys = —yxi Ga — x3). 


I = (%2) — x) — 12, 


(ii) Let a be the value of derivative ay at point P, then the tangent of point P and 
Cz only have a unique intersection R = (xs, y3), R zz P, where 


x2 a 
r = Quy — 2x, i 


3x2 
ys = —y + CES) - 33). 


Proof Let the functional equation of the connecting line between P and Q be y = 
ax + B on the xy—plane, where 


y2 = Ml 
a= , B = yı — Qx]. 
X2 — X1 


A point (x, ax + B) on line y = ax + £ is on elliptic curve Cz if and only if 

(ax + B 2 x? +ax +b. (6.6) 
Therefore, the three solutions of x? — (ax + B)? + ax + b = 0 are x, and each solu- 
tion will produce an intersection. But we assume that P and Q are at the intersection, 
so there is only the third intersection R = (x, ox + B) = (xa, ax3 + P). Because the 
three solutions x1, x2, x3 of equation (6.6) satisfy the following relationship 


2 
Xi HX + x3 =a". 


There is 
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yy — y 2 
x3 = QE — x1 = 12, 


y = 0x3 + B = —yi + (2) — x3). 


X2—2X1 


Thus, (6.4) holds. If point Q is infinitely close to point P, the connecting line becomes 
the tangent of curve Cz at point P, now 


dy 3x? +a 
a= — lei.» = . 
dx 2yi 


So the tangent has a unique intersection with Cg, R Æ P, R = (x3, ax3 + B), where 


3x2 
B= a — 2x, = [I — 2x, 


3x2 
ys = —y + C356: — x3). 


(6.5) holds, So as to complete the proof of Lemma (Fig. 6.1). 


Example 6.1 On the real plane, we give a specific example y? = x? — x to illustrate 


the addition rule on this elliptic curve: 
The point of Cz in the left half plane is called the torsion point of Cg, and the 
point of C in the right half plane is called the free point of Cg. 


Remark 6.1 In Lemma 6.1, if P = (x1, 0), that is yı = 0, then the only intersection 
of the tangent of point P and C; is defined as the infinity point *O" . 


From Definition 6.1 and Lemma 6.1, we have the following important corollaries. 


Fig. 6.1 Elliptic Curve y"=x*-x 
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Corollary 6.1 (i)All points of elliptic curve Cg form an Abel group under addition, 
in which the infinity point “O ” is the zero element of the group. This group is called 
Mordell group. 

(ii) If P = (x1, yi), Q = Go, y2) is a rational point, that is, xi, yj, xo, ya is a 
rational number, then another unique intersection R between the line between P 
and Q and Cg is also a rational point. 


Proof (i) is directly given by Definition 6.1. Conclusion (ii) is directly derived from 
Formula (6.4) and Formula (6.5) of Lemma 6.1. 


Elliptic curves over rational fields 

Let E — Q, then a, b, c in Definition 6.1 are rational numbers. Elliptic curves 
over rational number fields are one of the most important research topics in modern 
number theory. There are many important conclusions and famous number theory 
problems related to them, such as the famous “BSD” conjecture, the ancient con- 
gruence problem and so on. Mordell theorem is the most basic conclusion of elliptic 
curves over rational fields. Since cryptography only cares about elliptic curves over 
finite fields, here we briefly introduce some important results without proof. 

Let Cg be an elliptic curve in the field of rational numbers. From Corollary 6.1, 
all points of Cg form an Abel group G. In algebra, an Abel group is equivalent 
to a module over an integer ring, so an Abel group is also called Z—module. The 
Mordell group on elliptic curve Cz is regarded as a Z— module G, according to the 
decomposition theorem of modules on the principal ideal ring, a Z— module can be 
decomposed into the direct sum of a twisted module and a free module. Therefore, 
the Mordell group G on C; has 


G = Tor(G) @ Free(G). 


Mordell first proved the following important conclusions. Mordell Theorem: The 
Abel group G on elliptic curve Cg (E = Q is a rational number field) is finitely 
generated; in other words, G is a finitely generated Z-module. Therefore, Mordell 
group G can be decomposed into 


G = Tor(G) Q Z(a1,02,...,0,). 


where 0, 02, ..., œ, is a set of bases of free module Free(G) and r is the rank of 
free module. The rank r is only known to be finite, but how to calculate it is a famous 
number theory problem. The so-called BSD conjecture holds that r can be given by 
the function value of L-function on elliptic curve, but it has not been fully proved at 
present. 

Another problem related to elliptic curves is the ancient congruence problem, 
which can be traced back to Plato's time in ancient Greece. 

The congruent number problem: if n > 1 is a positive integer, is there a right 
triangle with rational side length, and its area is exactly n? 
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This problem is equivalent to the rank r > 0 of elliptic curve y? = x? — n?x, at 
present, this problem has not been completely solved. Chinese mathematicians prof. 
Tian Ye have made substantial progress in this problem. 

Elliptic curves over finite fields 

Let E = FF, be a q-element finite field, q = p", and p be a prime number. Let 


Sd o - f@), if p #2, ey 
y +ty— f(r), if p= 2. 
where ; 
ouem Paneer. — e 
Then an elliptic curve Cg on F; is defined as 
Cg = {(x, y) € EZ FG, y) = 0} U {0}. (6.9) 


where “O” is the infinity point. 

Obviously, the number of points in Cz is limited, let V; = |Cz|, be called the 
number of points of elliptic curve in F,. Ng < 2q + 1 is a trivial estimate, because 
each x € F; has at most two y values, together with the infinity point. The more accu- 
rate estimation of N} depends on the Riemann hypothesis on the field of univariate 
algebraic functions proved by A. Weil, which is a very profound result in mathemat- 
ics. A. Hasse proved the following results when F (x, y) is an elliptic curve. 


Theorem 6.1 (Hasse Theorem) Let N} be the number of elliptic curve F(x, y) = 0 
at the midpoint of F}, then we have 


IN; — (q + D| < 24/3. 


Proof Let x be a quadratic real feature in F}, that is 


0, ifa = 0, 
x(a) = +1, ifa = b’, b € F}, 
—], otherwise. 


By definition, it is obvious that the number of solutions of uw —ain FF, is 1+ x(a), 
so suppose N, is the number of solutions of elliptic curve F(x, y) = 0 in F,, where 
F (x, y) is given by Eq. (6.7), then 
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N, =1+ Yi + xG? x ax +b) 
xeF, 


=q+1+ >> x08 rax D). 


x€F, 


(6.10) 


We use IF, (x) to represent the rational function field on F,, then the univariate 
algebraic function field defined by y? = f (x) can be regarded as a quadratic finite 
extension field on F,(x). The genus d of this function field is d = 3. Hasse can 
prove that the Riemann hypothesis on this special algebraic function field is true; 
that is, all zeros of the corresponding Riemann é—function lie on the straight line of 
s= 1 + it. A special case of this conclusion is 


Ido xG? +ax+d)| < (d — D q = 24. (6.11) 


xeF, 


By (6.10), 
IN; — (4 + D| € 24q. 


We have completed the proof. 


Remark 6.2 (6.11) is called the characteristic sum over a finite field, so that g(x) € 
FF, [x] is any polynomial and x is any nontrivial multiplication characteristic over F,, 
according to A. Weil’s famous theorem, we have the following general characteristics 
and estimates, 

| >> x(Q))| € (degg — DV. 


xeF, 


Let's briefly introduce A. Weil’s theorem. Let IF; be an n-th extension on F}, 
that is n = [Fg : F4]. Ng is the number of solutions of elliptic curve F(x, y) = 0 
in extended field Fj». Zeta function Z(T, Cg) on elliptic curve C is defined as the 
formal power series of T': 


Too 1 
Z(T) = Z(T, Cg) = expo | =No T”). (6.12) 


n=1 


where exp(a) = e^ is an exponential function. A. Weil proves that Z(T) is a rational 
function, i.e., 
qT? —aT +1 


Z(T) = ———————. 
n (L= TU eg) 


(6.13) 


where ao is an integer depends on elliptic curve Cg. In fact, the above formula is 
valid for general algebraic curves. Because of N, = q + 1 — a, and o? —4q « 0 
(Hasse theorem). Therefore, zeta function Z(T') has two complex roots, that is, the 
two solutions o; and a of qT? — aT + 1 = 0, and Iż = I| = A/q. This is the 
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Riemann hypothesis on elliptic curves. A. Weil proved it on general algebraic curves 
for the first time. See Chap. 5 of Silverman (1986 of reference 6) for the specific 
proof process. 

From the above a. Weil theorem, take logarithms on both sides of Eq. (6.12) and 
compare the coefficients on both sides of the formal power series. Let Nx be the 
number of points of the elliptic curve in F4», then 


[Nar — (g" + D| < 242 (n = 1). (6.14) 


The above formula can also be derived directly from Hasse theorem. 

Now let's look at a specific elliptic curve in IF, y? +y= x3; thus, we have a 
better understanding of A. Weil's theorem. Because F(x, y) = y? + y — x? = Ohas 
three points in F3, the zeta function on the elliptic curve, 


LA. 
Z(T) = e», ex 
7 21? +1 
~ (10— T)ü0 - 2T) 


Write 27? + 1 = (1 — o4 TY(1 — aT), where a, = i2, a = —i4/2. Take loga- 
rithms on both sides of the above formula and compare the coefficients of T" on both 
sides, 
2" 4 1, if n is odd, 
" — |274-1-2(-2)?, ifn is even. 
Where N, represents the number of points of elliptic curve y? + y = x? in Fy. 
Finally, the Mordell group of elliptic curve on F; is a finite Abel group of order 
N,; according to the classification theorem of finite Abel groups, this group can be 
expressed as the direct sum of two cyclic groups, which will be further explained 
when necessary. 


6.2 Elliptic Curve Public Key Cryptosystem 


An elliptic curve over a finite field F} forms a finite Abel group G, which is similar to 
F^; therefore, the elliptic curve public key cryptosystem can be constructed by using 
discrete logarithm. Compared with other public key cryptosystems based on discrete 
logarithm (such as ElGamal cryptosystem), elliptic curve cryptosystem has greater 
flexibility, because when a huge q is selected, the working platform of ElGamal 
cryptosystem has only one multiplication group E but multiple elliptic curves can 
be defined on F}, so there will be multiple Mordell groups to choose, and elliptic 
curve cryptosystem has greater concealment and security. 
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Before introducing elliptic curve cryptography, we first discuss the computational 
complexity on two species group. The computational complexity of multiplication 
over finite field F, has been discussed in Chap. 4 Lemma 4.12, specially, a € F,, 
k is an integer, then Time(o^) = O(log k log? q). In the case of elliptic curves, the 
Mordell group G is an addition operation, so that P € G is a point. kP means that 
the points P are added k times continuously. 


Lemma 6.2 Let E = VF, be a q-element finite field, Cg be an elliptic curve on Fy 
defined by Weierstrass equations (6.7), (6.8) and (6.9), P € G, G be a Mordell group 
on Cg, then for any integer k, 


Time(kP) = O(logklog? q), ifk < Nz; 
Time(kP) = O(log‘ q), ifk > N}. 


where Ng is the number of points of curve Cg and the order of Mordell group G. 


Proof Let P = (x, y), y Æ 0,then P + P = (x', y), where x’ and y' are determined 
by Equation (6.5), (6.5) (addition, subtraction, multiplication, division, etc.) involved 
in the formula shall not exceed 20 calculations, and the bit operation times of each 
calculation is O (log? q). By the “repeated square method,” kP can be transformed 
into log k steps, thus 

Time(kP) — O(logklog? q). 


If y = 0, defined by P + P = O and “repeated square method,” thereis T ime(k P) = 
O (log k). 

If k > N}, because N;- P =0,letk=s-N,+r,1<r<QN,, thuskP=rP. 
We can calculate r P. Thus 


Time(kP) = O(log N; + log N; log? q) = O(log* q). 
We use Hasse's theorem: N, = q + 1 + O(A/q), there is V; = O (q), thus 
log N; = O(log q). 


Lemma 6.2 holds. 


Secondly, we consider how to correspond a plaintext unit m to a point on a given 
elliptic curve Cg, which is a necessary premise for encryption using elliptic curve. 
Unfortunately, there is no definite algorithm for polynomial bit operation, which can 
correspond any huge integer m to a point on any elliptic curve. Instead, we can only 
choose the probability algorithm with sufficiently low error probability to realize the 
correspondence from number to point. The so-called probability algorithm does not 
guarantee 100% success rate (therefore, each operation depends on your luck), but 
the success probability should be large enough, that is 


P (number to point correspondence} > 1 — £, € > O sufficient small. 
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Next, we introduce a probabilistic algorithm to realize the correspondence from 
number to point, which makes theoretical preparation for the construction of elliptic 
curve cryptosystem. 

Probabilistic algorithm 

Treat each plaintext unit as an integer m, 0 < m < M, k is an integer. Select a 
finite field F}, q = p” satisfies q > kM. We write the positive integer n from 1 to 
kM as follows, 


l<n<kM,n=mk+j,0<m<M,1<j<k. (6.15) 


Lemma 6.3 There is a 1-1 correspondence t between the set of integers A = 
{1,2,...,kM} and a subset of finite field F(q > kM). 


Proof Because q = p”, let g(x) € F,[x] be a monic irreducible polynomial, and 
deg g(x) =r — 1, from the finite extension theory of fields, F} is isomorphic to a 
quotient ring of polynomial ring IF, [x] over subfield F ,, that is 


F, = F [x] /<e(x)> = {dp tax +--++ a,-ix" |a; € Fp}. 


Each element o € F} uniquely corresponds to a polynomial ao + aix +--+ 
à, 1X"! , we write 


æ = (a, 1d, 5^ a400) p; 


is called the p-ary representation of a. 

For every m, 0 x m < M, each j, |< j <k, then it uniquely corresponds 
to n = mk + j, express n as a p-ary number, if the p-ary of n is expressed as 
(a, 14,2: a1a9) p, then let t(n) = o € F}. The uniqueness represented by p-ary, 
then r is an injection. 


A’ = (r(n)] <n x kM} C F,. 
Therefore, we establish a 1-1 correspondence t of A > A’. The Lemma holds. 


Next, for each m(0 < m < M), we establish a 1-1 correspondence o between m 
and the point on elliptic curve Cg. Arbitrary choice 1 < j < k, then n = mk + j 
corresponds to an element in F4, that is t(n) = x; € F}. For each xj, consider the 
solution of the following equation. 


y = f(x) =x} t axj +b. (6.16) 


If the above equation has a solution, let y, be one of the solutions, then P, = 
(xj, y1) € Ce, we let o (m) = Pm, the inverse mapping o (P4) of o is 


t(x) —1 


a (B4) =[ k 


l (6.17) 
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where r is the 1-1 correspondence in lemma 6.3. Because t~!(x;) = mk + j, so 


So a! is exactly the inverse mapping of o. From ø, a 1-1 correspondence between 
each m and the point on the elliptic curve is established. ø is called a probabilistic 
algorithm. 


Lemma 6.4 Probability algorithm o can successfully achieve probability > | — 3. 
that is 


P{is generated by o and the number m corresponds to 1-1 of the point} > 1 — x 
Proof When m,0 x m « M given, n = mk + j, where k is any given positive inte- 
ger, 1 < j < k. By Lemma 6.3, t(n) = x; € F}, then the probability that f(x;) = 
x$ + ax; + b is a square number is P in other words, the probability that equation 
(6.16) has a solution in F; is "E therefore, the probability of no solution is also L. 
We randomly and independently select j, 1 < j < k, the error probability of each 
j (no solution in Eq. (6.16)) is "E therefore, the error probability of k j is x Once 
Equation (6.16) has a solution, then P,, = (xj, y) € Cg, we can establish the 1-1 
correspondence o between m and points on Cg, o (m) = Pm. Thus 


1 
Pío Successfully implemented} > 1 — 23 


We complete the proof of lemma. 


Remark 6.3 f(xj)— x + ax; + bis a square number, that is, the probability that 
Equation (6.16) has a solution is exactly N,/2q, where N; is the number of points 
of Cz. By Hasse's theorem, N; /2q is very close to i 


Definition 6.3 Let Cg be an elliptic curve over a finite field F} and B € Cg bea 
point. For any point P on Cz, if there is an integer x, such that x B = P, x is called 
the discrete logarithm of P to base B. 


With the above preparation, we can establish elliptic curve public key cryptosystem. 

Diffie-Hellman key conversion principle 

Symmetric cryptosystem, also known as classical cryptosystem or traditional 
cryptosystem, is the mainstream cryptosystem before the advent of public key cryp- 
tosystem. It has high efficiency because its encryption and decryption share the same 
algorithm (such as DES, the data encryption standard algorithm launched by the 
American Bureau of standards in 1977). When Diffie and Hellman proposed asym- 
metric cryptosystem, they pointed out that symmetric cryptosystem and asymmetric 
cryptosystem are not completely separated. The two cryptosystems are interrelated 
and can even be used together. Diffie-Hellman key conversion principle is based on 
the following mathematical principles. 
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Lemma 6.5 Let p be a prime number, q = p', F; is a q-element finite field, Z p is 
the residual class ring mod p^, Zi, is an n-dimensional vector space on Fp, then F;, 
Z pr, Zi, have 1-1 correspondence with each other. 


Proof F, is an r-th finite extension on IF, so the additive group F} of F; is isomor- 
phic with F}, that is F7 = F^, therefore, there is a 1-1 correspondence between F, 
and F. Each a = (a9,a1,...,a0, 1) € E, define 


o(a) =a +aip +--+ aip! E Zp 


Then ø is a surjection and a injection of F, — Zp, so ø is a 1-1 correspondence 
of F, > Zp. Since there is a 1-1 correspondence between Fy and F7, and a 1-1 
correspondence between F; and Z, there is also a 1-1 correspondence between F, 
and Z, the Lemma holds. 


From the above lemma, we have the following conclusions. 


Lemma 6.6 Let N be a positive integer. Zy is a residue class ring mod N. Then for 
any prime p, there is a finite field F y» such that there is an injection o of Zy > F y, 
this injection is also called embedded mapping. 


Proof When N given, for any prime p, express N as a p-ary number, then exists a 
positive integer r > 1, such that p^! < N < p”. We write 


Zy = {0,1,2,..., N — 1} C {0, 1,2,..., N — 1, N,..., p — 1] = Zy. 


That is, Zy is regarded as a subset of Zr. Let Z p SF pr be 1-1 correspond, so o 
gives that Zy — F, is an injection. The Lemma holds. 


From the above conclusions, we can establish Diffie-Hellman's key conversion 
principle. Because symmetric cryptographic keys are related to the numbers of Zy, 
each number in Zy can be embedded into a finite field F, by Lemma 6.6. Therefore, 
the discrete logarithm on F; can encrypt each embedded number asymmetrically, so 
that the two cryptosystems can be combined with each other. 

Taking the affine cryptosystem introduced in Chap. 4 as an example, A isak x k- 
order reversible square matrix in Zy, b = (b1, b», ..., b) € Zi is a given vector, 
affine transformation f = (A, b) gives the encryption algorithm of each plaintext 
unit m = mm» -Mg € zs 


fm -ce-A| : |+ 
mMk by 
Let A = (aij)kxk, each aj; € Zy. By Lemma 6.6, we can embed aj; into a finite field 


FF,. aij is encrypted again by using the discrete logarithm algorithm on F,, so that 
the two cryptosystems can be effectively combined. 
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In the case of elliptic curve, we introduce the workflow of Diffie-Hellman elliptic 
curve cryptography. First, the user selects a public finite field F,, and an elliptic curve 
Cz on F}, randomly select a point P € Cg, let P = (x, y), then x € F}. By Lemma 
6.5, x corresponds to an r-dimensional vector (ao, a1, ..., a, 1) in F; space (where 
q = p”), consider (ao, 41, ..., d, 1) as a p-ary number, that is 


(ao, a1, es pa) — do dt apr etaa p. 


Then (ao, a1, ..., a, 1) can be used as the key of other cryptosystems, especially 
symmetric cryptosystems. 

Secondly, the user selects a common point B € Cz, like a finite field, as the basis 
of the discrete logarithm on the Mordell group. The difference from finite field is 
that the Mordell group on elliptic curve is not a cyclic group, so point B is not the 
generator of Mordell group. However, we require order o(B) of B to be as large as 
possible (0(B)| N,). When point B is selected, the working platform of elliptic curve 
cryptography is actually the subgroup < B > generated by B. 

In order to generate the key, each user can randomly select an integer a, whose 
order is roughly the same as N4, as their own user's private key, a should be strictly 
confidential. Calculate aB = A € Cy. Point A is the public key of each user. Now 
each user has its own public key (A, B) and private key (a, B). 

Massey-Omura elliptic curve cryptography. 

In order to encrypt and send a plaintext unit m(0 < m < M), m is corresponding 
to the only point Pm € Cz on elliptic curve Cg by using the probability algorithm 
introduced earlier. Let N = N, = |Cz|; that is, the order of Mordell group is known. 
Each user randomly selects an integer e to satisfy 


1<e<N,and (e, N) = lI. 
d = e! mod N is calculated by Euclidean rolling division method, that is 
de = 1(mod N), and 1 <d < N. 
Suppose user A wants to encrypt and send plaintext message P», to user B, so that 
(ea, d4) and (eg, dp) are the respective private keys of A and B. First, A sends a 
message e4 P, to B, and then B returns the message ege, Pn to A, A can calculate 
the message by using the private key d4. Because N Pm = 0, daea = 1(mod N), so 
daegea Pm = eg Pn. 
Finally, user A sends the calculation result e g P,, to B, and user B can read the original 
real message P, of user A by using the private key dp, because dgeg = 1(mod N), 


SO 
dgeg Py = Pa. 


242 6 Elliptic Curve 


It should be noted that even if user B receives the message e4 Pn sent by A for the 
first time, e4 Pn is given to user B as a point Q = e4 Pn on the elliptic curve. If B 
does not calculate the discrete logarithm, e4 and d4 are not known. Although the 
last user B already knows the plaintext P, the calculation of the discrete logarithm 
of Q under base P, is very complex. Similarly, when user A receives a reply from 
user B and calculates eg Pn, he cannot know B's private key (eg, dp). 

ElGamal elliptic curve cryptography 

ElGamal cryptosystem is another elliptic curve cryptosystem completely different 
from Massey—Omura cryptosystem. In this system, the order N of Mordell group of 
elliptic curve does not need to be known. All users jointly select a fixed finite field 
F,, an elliptic curve Cz on F; anda fixed point B € Cg on Cz as the basis of discrete 
logarithm. Each user randomly selects an integer a(0 < a < N,) as the private key, 
calculates Q = aB € Cg and discloses it. Its workflow is as follows: 

If user A wants to encrypt and send a plaintext unit P, to user B, the public key 
of Ais Q4 = a4 - B, the private key is a4, the public key of B is Qg = ap - B and 


the private key is ag. The encryption algorithm of A NM B is 
fn) = f (Ps) = (kB, Pm c kQg) =c. (6.18) 


The decryption algorithm is that user B multiplies the first number with private key 
ap and then subtracts the second number. That is, 


f (c) = Pm +kQp — ag(B). (6.19) 

Because Qg = ag - B, there is 
f (c) 2 Pn + kag + B — kag: B= Py. 
Where k is an integer randomly selected by user A. This integer k does not appear 
in cryptosystemtext c and is called a layer of “ mask” added by user A to protect 
plaintext Pn. In fact, the cryptosystemtext c = (A1, Az) received by user B is two 
points on elliptic curve Cg, where 
A, = kB, A2 = Pn + kQp = Pa +k(ap- B). 

Even if the third user knows the private key ag of user B (assuming that the private 
key of user B is not secure), decryption with A» — ap - B cannot obtain plaintext 
P n, because 


A» — ag: B = Pn +kQg — agB = Pm k(ag: B)— ag: BF Py, 


ifk #1. 
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The two elliptic curve cryptosystems introduced above are based on the selected 
elliptic curve Cg and a point B on Cg as the basis of discrete logarithm. How to 
randomly select Cg and B needs further research. 


Lemma 6.7 Letx? +ax +b € FF, [x] be a cubic polynomial, then xe+ax+b=0 
have no multiple roots in the split domain if and only if the discriminant 4a? + 27b? + 
0. 


Proof This conclusion can be deduced directly from the root formula of cubic alge- 
braic equation. 


In order to randomly select an elliptic curve on F}, Cg is determined by equation 

y? = x? +ax +b at x(F,) Æ 2, 3. Randomly select three elements (xo, yo, a) in 
F}, let 

b = yo — (xg + axo). 


Check whether f (x) = x? + ax + b has multiple roots. From Lemma 6.7, just check 
whether discriminant 4a? + 27b? is 0. If f (x) has no multiple roots, then select the 
elliptic curve y? = x? + ax + b. Where (xo, yo) € Cz isa point on an elliptic curve. 
Solet B = (xo, yo) is the base of discrete logarithm. Similarly, for q = 2" org = 3', 
we can also randomly draw an elliptic curve Cg and determine the basis B € Cg of 
the discrete logarithm at the same time. 

It should be noted that at present, no algorithm can calculate the number of points 
N; of any elliptic curve. Some special algorithms, such as schoof algorithm, are quite 
complex and lengthy in practical application, although the computational complexity 
is polynomial. 

Now we introduce the second method of selecting elliptic curves, called mod p 
method. An elliptic curve Cg, if E is a number field, such as E = R, Q, C, Cg 
is called a global curve. We use the mod p method to convert a global curve into a 
“local” curve. Firstly, a point B € Cg ona global curve Cg and C is selected, where 
B is the group element of Mordell group, its addition order is co, where E = Q is 
the rational number field. 


Cp: y? 2 x? E ax 4 b, a,b € Q. 


Let p be a prime number and coprime with the integers in the denominators of a and 
b, then we obtain an elliptic curve on IF, 


Cg mod p : y? = x? + ax + b(mod p), a, b € Fp- 


and a point B mod p on Cg mod p, when localizing an elliptic curve, the choice of 
prime p only needs to satisfy 


p 1 aand b 's denominator, and 4a? + 27b? Æ 0(mod p). 


In fact, we can ask further 
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N, = |Cg mod p| = prime. (6.20) 


In this way, the Mordell group of Cg mod p is a cyclic group, and any finite point 
of Cg mod p will be the generator of the group. At present, there is no deterministic 
algorithm for selecting the prime number p satisfying Formula (6.20), and it is gen- 
erally speculated that a probabilistic algorithm with success probability > OG; 
exists. 


6.3 Elliptic Curve Factorization 


In 1986, mathematician H.W. Lenstra used elliptic curve to find a new method of 
factor decomposition. Lenstra’s method has greater advantages than the known old 
algorithms in many aspects, which is also one of the main reasons why elliptic curve 
has attracted more and more attention in the field of cryptography, We first introduce 
a classical factorization method called Pollard (p — 1) algorithm. 

(p — 1) algorithm 

Suppose n is a compound number, and p is a prime factor of n; of course, p is 
unknown and needs to be further determined. If p — 1 happens to have some small 
prime factors, or all prime factors of p — 1 are not too large, the essence of (p — 1) 
method is to find the prime factor p with this property of n. (p — 1) method can be 
completed in the following steps: 


1. Let B be a positive integer. Select a positive integer k so that k is a multiple of 
most positive integers smaller than B, for example, k — B!, or k can be the least 
common multiple of all positive integers smaller than B. 

2. Select a positive integer a to satisfy 2 < a < n — 2, (a, n) = 1, such as a = 2, 
or a — 3, and any randomly selected positive integer. 

3. Using the "repeated square method" to calculate the minimum nonnegative resid- 
ual a* mod n of a^ under mod n. 

4. The maximum common divisor d = (a — 1, n) of a^ — 1 and n is calculated 
by Euclidean rolling division method. 

5. Ifd = lord = n, thatis, if d is the trivial factor of n, re select a, and then repeat 
steps 1—4 above. 


In order to explain the working principle of (p — 1) algorithm, we further assume 
that k is a multiple of all positive integers less than B, and p|n, 


p—1=| | pi", where v p < B. (6.21) 
There is p — 1|k. By Fermat congruence theorem, 
a?^-! = | (mod p), => a* = 1(mod p). 


So p|d, where d = (a* — 1, n). 
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Definition 6.4 Suppose n is a compound number, p|n. B is a sufficiently large 
positive integer arbitrarily selected, and p is called B—smooth prime, if Eq. (6.21) 
holds. That is, p — 1 can be decomposed into the product of prime powers less than 
B. 


Lemma 6.8 Suppose n is a compound number and B is a positive integer. If n has a 
B—smoothing prime factor p, select k and a according to the algorithm steps 1 — 4, 


then we have d = (a* —1,n) > 1, so we have factor decomposition n = d - F: 


Proof If p is a smoothing prime factor of n, then we have p|(a* — 1, n), thus d > 1. 
The Lemma holds. 


In the above algorithm, if d = (a* — 1, n) = n. That is nla‘ — 1, if the algorithm 
fails, we must reselect a and carry out a new round of testing. 


Example 6.2 Factorization of n = 540143, if (p — 1) method is used, then choose 
B = 8, k = 840, is the least common multiple of 1, 2, ..., 8, leta = 2, calculate the 
minimum nonnegative residue of 29^ under mod n, 


2599 = 53047 (mod 540143). 
Calculate (2949 — 1, n), 
d = (299 — |, n) = (53046, 540143) = 421. 


So we have factorization 540143 — 421 x 1283. 


Pollard's (p — 1) method is essentially the multiplication group of Z p, the order 
of Z^, cannot be divided by a huge prime number; otherwise, this method will not 
work. Lenstra can overcome this disadvantage by using elliptic curves for factor 
decomposition, because there are many elliptic curves to choose from, we can always 
hope that the order of Mordell group on an elliptic curve is not divided by a huge 
prime number. Next, we introduce Lenstra's method in detail. First, we discuss the 
elliptic curve mod n. 

The following general assumption is that n is an odd number and a compound 
number, p|n (p is unknown) and p > 3. Let m be a positive integer, x1, x2 be two 
rational numbers, and the denominators of x, and x2 are mutually prime with m, so 
that xı — x? = J is a reduced fraction, then define 


xı = x2(mod m), if mjc. (6.22) 


Lemma 6.9 Suppose xı € Q is a rational number, if its denominator and m are 
mutually prime, there is a unique nonnegative integer r, such that x, = r (mod m). 
r is called the nonnegative residue of xı under mod m, denote as r = xı mod m. 


Proof Write x; = P, where (a, m) 21, xy —x = =at, because the congruence 


equation —ax + b = O(mod m) has a unique solution r, 0 < r < m. So there is a 
unique r such that x; = r (mod m). The Lemma holds. 
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In order to randomly generate an elliptic curve Cg over the rational number field 


Q, we randomly select three integers a, xo, yo € Z, let b = yo — do — axo to satisfy 


A = 4a? + 27b? £0, and (^, n) = 1. (6.23) 
We get an elliptic curve Cz: y? =x? +ax +b, where (xo, yo) € Cz. Because 
a,b € Z, A = 4a? + 27b* and n are coprime, then for all prime p, p|n, —» ^ x 


O(mod p). Therefore, as a cubic algebraic equation over a finite field F ,, x?+ax+b 
has no multiple roots, so we obtain a “local” elliptic curve Cg mod p, where 


Cg mod p : y? = x? + ax + b(mod p). (6.24) 


And a point (xo mod p, yo mod p) € Cg mod p on Cg mod p, let's write this point 
on Cg mod p with P, that is 


P — (xo mod p, yo mod p) € Cg mod p. 


Next, we want to calculate k P, like the “continuous square method" of multipli- 
cation, and there is a similar continuous doubling method for addition. 


Lemma 6.10 When k is a huge integer, the computational complexity of kP is 
Time(kP) = logk - Time(P). 
Proof k is expressed as a binary integer, i.e., 
k = ao - ai2 + a2? +++ + ay 2" , V a; = Or 1. 


We can double continuously, that is, 2i P -2/ P 2 2.2! P(0 « j <m — 2), thus 
obtain kP, m is the binary digit of k, m = O(log k), there is 


Time(kP) = logk - Time(P). 


The Lemma holds. 


Theorem 6.2 Let Cg be an elliptic curve over the rational field Q, define the equa- 
tion as y? = Xx? E ax + b, where a, b € Z, and (4a? + 27b?, n) = 1. Let P, and P» 
be two points on Cg, and their denominators are coprime with n, and P1 #4 — Po, 
P, + Po € Cg. Let Pi + Po = (x, y), then the necessary and sufficient condition for 
the denominator of x and y to be mutually prime with n is that there is no prime 
factor p|n of n, P, mod p and P; mod p are two points on the local curve Cg mod p, 


P, mod p + P) mod p = 0. 


Proof Let P, = (x1, y1), P» = (x2, y2) is the two points on Cg. Pi + P» = (x, y). 
If the denominators of x and y are coprime with n, we have to prove 
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P, mod p + P; mod p £ 0, V pin. (6.25) 


If x, Æ x; (mod p), it is obvious that Formula (6.4) is true from Formula (6.25). 
Might as well make x; = x2(mod p). If Pi = Po, now x; = x2, y; = yo, we only 
need p { 2yi. If p|2y;, because the coordinates of 2P; = (x, y) are determined by 
equation (6.5), 

2y1 


3x7 +a 


x = (BEY — 2x; 
y= y1— CX G1 — x). 


Wherea = Sir By p2yi, =>> 3x? +a = O(mod p). Because n is an odd number, 
so p|yi, we have 


x? + ax, +b = 0(mod p); 
3x? + a = 0(mod p). 


Thatis, x; is the rootof f(x) = x? + ax + band derivative f'x)-2 3x? + a(mod D). 
This is contradictory to (4a? + 27b?, n) = 1. So you might as well let Pj Æ P», now 
xı = x5 (mod p), xı Æ x2(because P, Æ — P2), we can write 


Xp =x, +tp,r >l. 


The numerator and denominator of t and p are mutually prime, which can be deduced 
from Formula (6.4), 
yo = yi + sp’. 


On the other hand, by y? = 33 + ax» + b, there is 


2 
X2 


(x1 + tp’) +a(x1 o tp) +b 
=x} tax, +b + tp’ (3x7 + a)(mod p) (6.26) 
= y? + tp" (3x7 + a)(mod p). 


But x; = x2(mod p), y; = y2(mod p), there is 
P, mod p + P; mod p = 2P}. 


The above formula is infinite if and only if y; = y2 = O(mod p). If yj = y = 
O(mod p), then y? — y? = (y2 — y) (o + yı) will be divided by p'*!. Therefore, 
Equation (6.26) contains 3x? +a = O(mod p). It’s impossible. Because x? +ax + 
b(mod p) has no multiple roots, x, cannot be the roots of X? + ax + b and derivative 
3x? under mod p. This proves that Formula (6.25) holds under the assumption. 
Conversely, if Eq. (6.25) holds, we prove that the denominator of P, + P; and 
n are coprime. Fixed p|n, if x Æ x» (mod p), from equation (6.4), the denominator 
of P, + P, and p are coprime. Might as well make x; = x2(mod p), then y; = 
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zEy, (mod p). Because P, mod p + P; mod p £0, we have y» = y; Æ O(mod p). 
First assume P» = P, then Equation (6.5) and the fact of y; 4 O(mod p) prove 
that the denominator of P; + P; = 2P, and p is coprime. Finally, let P; Z Pj, we 
write x2 = x; + tp’, (t, p) = 1, using the congruence of Formula (6.26), there are 

» 


2l = 3x? + a(mod p). 
X2 — X1 


Because p { y2 + yı = 2yı (mod p), so the denominator of 


X-»x» n-» 
Qaod y)DOGo—-x) x2 — XY 


cannot be divided by p, by (6.4), the denominator of P, + P» cannot be divided by 
p. Since p|n is arbitrary, we complete the proof of the whole theorem. 


Lenstra algorithm. 

Let n be an odd compound number, we hope to find a nontrivial factor d of n, d|n, 
1 <d <n, so there is factorization n =d- a Previously, we have introduced the 
random selection of an elliptic curve Cg on rational number field Q and a point P on 
Cz. Lenstra’s algorithm hopes to factorize n by (Cg, P). There is no doubt that the 
Lenstra algorithm to be explained below is also a probability algorithm. If (Cg, P) 
cannot be factorized successfully, as long as the probability of failure is p < 1, 
select another elliptic curve and a point above. If this continues, after randomly and 
independently selecting n elliptic curves, the probability of successful factorization 
of n, 


P[n- d. 7) 1— p'(p « D. 


When n is sufficiently large, the success probability of Lenstra algorithm can be 
infinitely close to 1. Therefore, the so-called Lenstra algorithm can be simply sum- 
marized as an algorithm that factorizes n by using any rational elliptic curve (Cg, P), 
and its failure probability is p < 1. 

Let (Cz, P) be a given rational elliptic curve, and B and C be the positive upper 
bound of selection. Let k be divided by some small prime powers, to be exact, 


k= I] 1”, (6.27) 
1</<B 
where o; is the largest index satisfying /^' < C. Thus a = [556 ]. 


Next, we calculate k P (mod n), by (6.4) and (6.5), if x» — x; and 2y; have arational 
number whose denominator and n are not prime, for example d = (x2 — xi, n), 1 < 
d <n; Then we have factorization n = d - ^. If d = n, then re select point P on 
rational elliptic curves Cg and Cg. By Theorem 6.2, d > 1 appears in these rational 
numbers x? — x, and y; if and only if there is a kı, such that 
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kı - (P mod p) = 0,V pln. 

From the selection of equation k in (6.27), there is a maximum probability kı |k, thus 
k - (P mod p) = 0, Y pln. 


Therefore, in Lenstra algorithm, by calculating the rational point k P, there is a great 
probability that there is a certain p, p|n such that 


k(P mod p) = 0, p|n, is a prime number. (6.28) 


By Theorem 6.2, let P = (xi, yi), (k — 1)P = (x2, y2), thus d = (x2 — xı, n) > 1 
or (2y;, n) = d' > 1, we obtain the nontrivial factorization of n. 

From the above Lenstra algorithm, the key problem is to calculate k - P. Using 
the continuous doubling method given in Lemma 6.10, we only need to calculate 
2P,2(2P), 2(AP), ..., 2? P, 3(2? P), 3(3- 2? P)... 352? P. this continues until 
(Tyereg I) P, Le. KP. 

For the probability estimation and computational complexity of Lenstra algorithm, 
see 1986 of reference 6. 


Exercise 6 


1. LetCg = (x, y) € C | y? = x? + ax + b,a, b € R}isacomplex elliptic curve, 
then Cz (R? is a subgroup of Cz, determine all subgroups of Cg whose coor- 
dinates are real numbers. 

2. The points of order n on complex elliptic curve and real elliptic curve are deter- 
mined. 

3. Take an example of a rational elliptic curve Cz, there are exactly two points on 
Cpg with order 2. Another example is that there are exactly four points on Cg 
with order 2. 

4. Let Cg is a real elliptic curve, P € Cg isa finite point, determine the geometric 
equivalence condition of o(P) = 2, o(P) = 3, o(P) = 4. 

5. Calculate the order of points on the following rational elliptic curves: 

(i) P = (0,16), Cg : y? = x? + 256; 

Gi) P = (53) Ce iy =x £d 

(iii) P = (3,8), Cg : y? = x? — 43x + 16; 
(iv) P2(0,0,Cg:y?- y = x? — x*. 

6. Proved that the following elliptic curve has exactly q + 1 points in F4: 
(a) y? 2 x? — x, when q = 3(mod 4); 

(b) y? = x? — 1, when q = 2(mod3), q is odd; 
(c) y? + y = x3, when q = 2(mod 3). 

7. Let q = 2”, the elliptic curve Cg on F; be: y? +y= x; P = (x, y) € Cz, 
calculate 2P and — P. If q = 16, prove that every point on Cg has order 3. 

8. Please give a probabilistic algorithm to find a nonsquare number in the finite 
field F}. 


250 6 Elliptic Curve 


9. The deterministic algorithm can map the embedding of plaintext units to any 
FF, — elliptic curve. Please give the specific algorithm process for the following 
elliptic curves: 

(1) Ce: y? — x? — x, when q = 3(mod 4), 
(2) Ce: y? +y= x3, when q = 2(mod 3). 
10. Let Cz be an elliptic curve on the finite field F,,, and N, represents the number 
of midpoint of Cz in the finite field F,, then 
(i) If p > 3, whenr > 1, N, is not prime. 
(ii) When p = 2, 3,acounterexample is given to show that N, is a prime number. 
11. Take an example of an elliptic curve Cg, which has only one point on F4, the 
infinity point. Take N, as the number of points of Cg on Fy, then N, is the 
square of Mersenne prime 2" — 1. 
12. Decompose n — 53467 at k — 840, a — 2 using Pollard's (p — 1) method. 
13. Let ng = 2” + 1 be Fermat number, the following is Pepin's method to detect 


whether n, is a prime number: 

k-1 
(i) n, is a prime, if and only if there is an integer a, a = —1(mod n,). 
(ii) If ng is a prime, then a € Z% over 50% has the congruence property of (i). 
(iii) When k > 1, we can always choose a = 3,5, ora = 7. 
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Chapter 7 A) 
Lattice-Based Cryptography get 


7.1 Geometry of Numbers 


Let R” be an n-dimensional Euclidean space and x = (x1, x2, ..., Xn) € R” be an 
n-dimensional vector, x can be a row vector or a column vector, depending on the situ- 
ation. If x € Z”, then x is called a integral point. IR"*" is all m x n-dimensional matri- 
ces on R. x = (xi, 32,..., Xn) € R”, y = (yi, y2,---, Yn) € R”, define the inner 
product of x and y as 


(x,y) = ux. (7.1) 
i=l 
The length |x| of vector x is defined as 
Bex x)= oa (7.2) 


à € R, then A - x is defined as 
AX = (Axi, xo, ..., AXg). (7.3) 


If the inner product (x, y) = 0 of two vectors x and y, x and y are said to be 
orthogonal, denote as x.L y. 


Lemma 7.1 Letx,y € R”, A € R is any real number, then 


(i) |x| > 0, |x| = 0 if and only if x = 0 is a zero vector; 
(ii) |Ax| = |Al|x, Vx eR", Ae R; 
(iii) (Trigonometric inequality) |x + y| < |x| + |yl and |x — y| > ||x| — lyll: 
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(iv) (Pythagorean theorem) If and only if x Ly, we have 


Ix € y = |x? + ly. 


Proof (i) and (ii) can be derived directly from the definition. To prove (iii), let 
x = (X1, X2,... Xn), y = (Y1, Y2,---> Yn) € R^, by Holder inequality: 


4&3]. 


n n n n 
Ix cx 2G x =>dox7 +2) xy tX 
i-l i-l i-l i=1 
n n n 
2x 2/5 xx + Dye 
i-l i-l i-l 


2 
n n 


Yx | doy? = dl y, 


n 
] Xi yi 
i-l 


So there is 


IA 


I^ 


so (iii) holds. Then, by the definition of inner product, 


(x Ey, x ty) = (x, x) £2(x,y) + (y, y), 


if x.Ly, then 


Ix € yl? = |x? + Ly. 


Conversely, if x is not orthogonal to y, then (x, y) 4 0, thus 


Ix + yl? A Ix? + Iyl. 
Lemma 7.1 holds. 


From Pythagorean theorem, for orthogonal vector x_Ly, we have the following 
conclusion, 
Ix + y| = |x — yl if x. Ly. (7.4) 


Definition 7.1 Let Z C R” be a subset, 0 € ZZ, Z is called a symmetric convex 
body of R”, if 


(i) x € Z, > —x € Z (Symmetry); 
(ii) Let x, y € Z, à > 0, u > 0, and à + u = 1, then Ax + wy € 4 (Convexity). 
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The following example is a famous example of a symmetric convex body 
defined by a set of linear inequalities. Let A € R”*” be an m x n-order matrix, 
€ = (Cj, €5, ..., Cn) € R”, and V cj > 0, Z(A,c) is defined as the set of solu- 
tions of x = (x1, X2, ..., Xn) € IR" defined by the following m linear inequalities, 
let A — (aij)mxn. 


wx ze 1<i<m. (7.5) 


We have 


Lemma 7.2 For any A € R"*". and any positive vector c = (c1, €2,---; Cn) € R^, 
then &(A, c) is a symmetric convex body in R”. 


Proof Obviously zero vector x = (0,0,...,0) € Z(A, c), and if x e Z(A, c) > 
—x € F(A, c). Soweonly prove the convexity of Z(A, c). Suppose x, y € (A, c), 
let 

z=Axtpy,rA>0,u>0,At+uU=1. 


Then for any 1 € i < m, we have 


laiizi + diaz2 + +++ + GinZn| 
< Al|aiix1 + Gj2X2 + +++ + GinXn| + ulaiiyi + ai2y2 + +++ + inysl 
< Ac; + wc; = Ci. 


So there is z = Ax + wy € &(A, c). Thus, Z(A, c) is a symmetrical convex body. 
Lemma 7.2 holds. 


Lemma 7.3 Let Z C R” be a symmetrical convex body, x € Z, then when |A| < 1, 
we have Ax € Z. 


Proof By convexity, let 


1 1 
= -(1 Xr = —(1—A). 
p ;0*. c zí ) 


Then p > 0,0 > 0, and p 4- o = 1. So there is 
px t o(—x) = Ax e &. 
The Lemma holds. 


Lemma 7.4 If x, y € Z, then Ax + uy € Z, where X, ju are real numbers, and 
satisfies |A| + |u| < 1. 


Proof Let n be the sign of à and m be the sign of u, then by Lemma 7.3, 
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x’ = m(A| + lux e &, 
y = MAA + luly e Z. 


Let p = then p +o = 1. By definition, we have 


n 
CEK = ne ; 
Ax + Uy = px' + 0y € P, 


thus the Lemma holds. And this result is not difficult to be extended to the case of n 
variables. 


Lemma 7.5 (Blichfeldt) Let 42 C R” be any region in IR" and V be the volume of 
4. If V > 1, then there are two different vectors x € Z, x' € & so that x — x' is an 
integral point (thus a nonzero integral point). 


Proof For Vx = (x1, X2, ..., Xn) € R”, we define 
[[x1] = (Qa, [x2], ---, DD € Z” (7.6) 
and 
[x] = (61,85, ...,09,) € Z”, (7.7) 


where [x;] is the square bracket function of x; and 6; is the nearest integer to x;. 
For each integral point u € Z", define 


KR, = ix e FI] = uj 


and 
D, = (x — u|x € Za}. 


Because Zu, O Zu, = Ø, if uy A uz. Thus by Z = U, A, 


— V = Vol(Zz) = XU Vol(Z,) 
Ey ed 


where V, = Vol(Z,). Thus V, = Vol(D,). If D, is disjoint, then 


Ev. = vai (Un) C [0, D x --- x [0, D. 


u 


There is 


2. 8d 
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so there is a contradiction. Therefore, there must be two different integral points 
uandu’ (u zu) > D, N Dy z0,thatisx Cu-x'—u—x-—x' =u-we 
Z". The Lemma holds. 


Lemma 7.6 (Minkowski) Let Z be a symmetric convex body, and the volume of Z 
V = Vol(Z) > 2", 


then & contains at least one nonzero integer point. 


Proof Let 

1 1 

37 = [zn <a} s 
Thus 


1 1 
1{ = = -—V>l 
Vo (52) » > 1, 


by Lemma 7.5, there are integral points where x’, x" € 142 => x'—x"z-u is 
nonzero. We prove u € &. Write x! = Sy, x" = $z, where y, z € Z. Then 


1 
2^ yeZ,zred. 


By Lemma 7.4, then u € 4. The Lemma holds. 


Remark 7.1 The above Minkowski's conclusion cannot be improved, thatis V > 2”, 
it cannot be improved to V > 2". A counterexample is 


4f = {x e R"|x = (x1, xo, ..., Xn), V |xi| < 1}. 


Obviously Vol(42) = 2", but there is no nonzero integer point in ordinary Z. 


When Vol(@) = 2", in order to make a symmetric convex body & still have 
nonzero integral points, we need to make some supplementary constraints on Z&, 
first, we consider the bounded region. Let 2 C IR", call Z bounded, if 


KR — ix = (X1, X2,..-,Xn) € R"||x;| < B, 1 < i <n}, 
where B is a bounded constant. 


Lemma 7.7 Let AER””” bea reversible matrix, d = |det(A)| > 0,c = (ci, c2, ..., 
Cn) € R” isapositive vector, thatisV cj > 0, then the symmetric convex body 4£(A , c) 
defined by Eq. (7.5) is bounded and its volume 


Vol(Z£(A, c)) = 2^d-lcc; Dg. 
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Proof Let A = (dij)nxn. Write Ax = y, then x = A ! y. And let A^! = (Bij) nxn, 
then for any x;, there is 


xsl = Y buy; S Do lbyl-cj s B. 
j=l j=l 
where B is a bounded constant. Therefore, ZZ (A, c) is a bounded set. Obviously 
Vol(Z(A, c)) = f z J dx ,dx---dxpy, 


X=(X1,X2,...,Xn EB (A,c) 


do variable replacement Ax = y, then 


dx = dx,---dx, dyidy? -+ : dyn. 


u 1 
~~ |det(A)] 


Thus 


Vol(4(A, c)) = : j / dy,d d 
o ,C)) = jdet(A)| yidy2 Yn 
n 


—Cn 
n 
= 2nq-1 I] Ci, 
i=1 


Lemma 7.7 holds. 


Remark 7.2 1n (7.5), * x" is changed to “<” to define #(A, c), and the above lemma 
is still holds. 


Now consider the general situation, let A = (aij)mxn. If m > n, and rank(A) > n, 
then #(A, c) defined by Eq. (7.5) is still a bounded region. Obviously if m < n, or 
m = n,rank(A) < n, then Z(A, c) is an unbounded region, and V = oo. Therefore, 
we have the following Corollary. 


Corollary 7.1 Let A = (aij)mxn m < n or m =n, det(A) = 0, then for any small 
positive vector c = (c1, C2, ..., Cn), 0 < cj < £, Y (A, c) contains a nonzero integer 
point. In other words, the following m inequalities 


n 
J aijxj <ée,l<i<m. 


j=l 


There exists a nonzero integer solution x = (x1, X2, ..., Xn) € Z". 
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Proof When e > 0 given, then Vol(Z(A, c)) = oo > 2". By Lemma 7.6, #(A, c) 
contains at least one nonzero zero point. 


Let A € R"*" be a matrix of order m x n,c = (c1, C2, ..., Cn) € R” isa positive 
vector, that is V c; > 0, write A = (aij)mxn, (A, c) is defined as the set of solutions 
X = (x1, X2, . . . , Xn) of the following linear inequality: 

n 
D ajjXj| S C1, 
j=l 
(7.8) 
n 
y aux; «cj, 1=2,...,m 
j=l 


When A € IR"*" is a reversible square matrix, we discuss the nonzero integral point 
in symmetric convex body Z'(A, c). 


Lemma 7.8 IfA € IR" isa reversible matrix and c = (c1, c2, ..., Cn) isa positive 
vector, when 
€1€2::: €, = |det(A)], (7.9) 


Then &'(A, c) contains a nonzero integer point. 


Proof When cic? --- c, > |det(A)|, because of 


n 


; 2"cic5 «€, F 
Vol(Z'(A, c)) = o A $2. 


by Lemma 7.6 and 7.7, then the proposition holds, we only discuss the case when 
the equal sign of formula (7.9) holds. 

Let £ be any positive real number, 0 < € < 1, then by Lemma 7.7, there is a 
nonzero integral solution x^? = (x ; 3 ..., x) € Z” satisfies 


n 
J up" <cqtex<ctl, 
j=l 


(7.10) 


And there is an upper bound B independent of £, which satisfies 


jo pe B, dog um 
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The integral point x satisfying the above bounded condition is finite, so there must 
be a nonzero integral point x Z 0, which holds (7.10) for any € > 0. Let e — 0, then 
the Lemma holds. 


In the following discussion, we make the following restrictions on 4 C R”: 


Z is a symmetric convex body, # is bounded, and # is a closed subset of IR". 


(7.11) 
Obviously, when A is an n-order reversible square matrix, for any positive vector 
€ = (€1, 5, ..., Cn), BA, c) satisfies the above restriction (7.11), but Z’ (A, c) does 


not because &’(A, c) is not closed. 


Definition 7.2 If Z2 C IR" satisfies the restriction (7.11), then for any x € IR", define 
the distance function F (x) as 


F(x) = Fa (x) = inf[A[A > 0, A^!x e P}. (7.12) 


By definition, it is obvious that we have the following ordinary conclusions: 


0) F(x)=0 & x = 0; 
(ii) If A is a reversible n-order square matrix, the distance function defined by 


&(A,c) is 


F(x) = max ae X aijx; ‘ (G1.12’) 


Property (i) can be derived from the boundedness of Z, and property (ii) can be 
derived directly from the definition of Z(A, c). Later we will see thatO < F(x) < oo 
holds for all x € IR". The main property of distance function F (x) is the following 
Lemma. 


Lemma 7.9 If F(x) is a distance function defined by Z satisfying the constraints, 
then 


(i) Let 4 > 0, then x e A4 & F(x) <A; 
(ii) Fx) = |A|F(x) holds for all à € R, x € R"; 
(iii) F(x--y) € F(x)- F(y), Vx, y e R”. 


Proof Since 4 is closed, by the definition, F^! (x)x € 2. Thus, if A > F(x), by 
Lemma 7.3, then 

E F(x) 

À x —|<l. 


= Te) . Fx, | 


We have A^!x € 4? = x € AZ. Conversely, if A < F(x) > A^!x ¢ Z. So when 
x € A—4£, there must be A > F(x), (i) holds. 
(ii) is ordinary. Because |A|-! F^! (x)Àx € 2. There is 


F(Ax) < ALF (x). 


7.1 Geometry of Numbers 261 


Conversely, let ó = F (àx), because of 6^! Ax € Z, you might as well let A 0, thus 
ó 
F(x) < E => AF(x) < F (àx). 
So there is F(Ax) = |A| F (x), (ii) holds. 


To prove (iii), we let uj = F(x), u = F(Y), => uix € Z, u3 'y € 4. By 
Lemma 7.4, we have 


- Hı = H2 E 
(Wi cu) +y) = (ux) + (u5!y) € Z. 
Hi + ua Hı + M2 
Thus 
F(x + y) < mı + po. 
The Lemma holds. 


Let the volume of Z € R” be V > 0, there are n linearly independent vec- 
tors (01,02, ...,0,) in Z to form a set of bases of R”. For any real number 
Hi, H2, - --, Un, by Lemma 7.9, we have 


F (mii +-+ + UnAn) < |MilF (a1) + |u| F@2) + -+++ [Mal F æn) 
< [ml + [Mal +++ dual. 


Because a; € Z > F(q;) < 1, so the above formula holds. That proves for V x € 
R” > F(x) < œ. 


Corollary 7.2 Let Z C R” meet the limiting conditions (7.11), and Vol(22) > 0, 
then 


(i) Vx € R”, there is à such that x € X2; 
(ii) Let (01,02, ..., An} C & bea set of bases of R^, then 


n 
3 pioes||eil + [eal t lnl < | c 4. 
i=1 


Proof Because F(x) < oo, so by (i) of Lemma 7.9, we can directly deduce the 
conclusion of (1) and (ii) given directly by Lemma 7.4. 


Now let j be a subscript, and we define A; as 
Aj; = min(A > 0|AZ contains j linear independent integral points in IR"), (7.13) 


and A, is called the jth continuous minimum of Z. By Lemma 7.3, AZ? C NZ, if 
0 x à x X'. Therefore, A increases continuously, then AZ? can always contain any 
set of desired vectors. Therefore, the existence of À; is proof. 
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By Lemma 7.6, let V be the volume of Z, then Vol(AZ) = A"V, for the first 
continuous minimum A, we have the following estimation 


y <2, (7.14) 


For; (j = 2), there is no explicit upper bound estimation, but we have the following 
conclusions. 


Lemma 7.10 Let Z C R” be a convex body satisfying the limiting condition (7.11), 


V = Vol(&), A1, A3, ..., An ben continuous minima of Z, then we have 
2n s 
E VA1A2---À4 x 2". (7.15) 
n! 


Proof We only prove the left inequality of the above formula, and we continuously 
select the linear independent whole point x, x, .. ., x? such that x? € A;Z, 
and x? x(0, xO,, , , ,, xO is linearly independent. Let x) —(x 1, xj», ..., xj4) € 
Z^. Because matrix A = (xj;),x, is an integer matrix, and det(A) Æ 0, so 

| det(A)| 7 1. 


By Lemma 7.9, for any constant 441, 45, ..., Hn, We have 


F (uix) + pax +- pax ™) 
< |ui F GO) + [uel F (x9) H --- nl FO) 
< PTT + [A245 qe |Unlàn. 


Thus, if |ui |A + |u2|A2 +--+ + [Malan < 1, then 
paix + uox +--+ ux" e R. 
So set 
By = (uix + uox +-+ + pax” uaa + [waldo +--+ + [Malan < 1) C Z. 


The volume of the left set Z is 


2") det(A)| 
Vol(Z,) = [ef (jes 
nays An 
[uili +l ulà2+ +l UnlàÀn<1 
2n 
ML MS 
SOGAR 
So there is 


n 


nias scs < Vol(4/1) € Vol(47) = V. 
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Therefore, the left inequality of (7.15) holds. The proof of the right inequality is 
quite complex and is omitted here. Interested readers can refer to the classic works 
(1963, 1971) of J. W. S. Cassels. 


An important application of the above geometry of numbers is to solve the problem 
of rational approximation of real numbers, which is called Diophantine approxima- 
tion in classical number theory. The main conclusion of this section is the following 
simultaneous rational approximation theorem of n real numbers. 


Theorem 7.1 Let 0), 62,..., 6, be anyn real numbers, 0; 4 0, then for any positive 
number N > 1, there are nonzero positive integers q and pi, p2, ..., Pn to satisfy 


gO; — pil z N^, 1xizmn 


(7.16) 
ql x N. 


Proof The proof of the theorem is a simple application of Minkowski’s linear type 
theorem (see Lemma 7.8). Let A € R+)*+) be an (n + 1)-order reversible square 
matrix, defined as 


Obviously |det(A)| = 1. Let (n + 1)-dimensional positive vector c = (N E N zT 
"ET NS. N), because 


C12: CnCn41 = NT! - N = 1 > |det(A)|. 


So by Lemma 7.8, the symmetric convex body (A, c) defined by A and c has 
a nonzero integral point x = (pi, po. ..., Pn, q) #0. We prove q #0. Because 
x £0,ifg = 0, then p, 40 (1 < k < n), therefore, the k-th inequality in Eq. (7.16) 
will produce the following contradiction, 


1 <q — pel < N7* <1. 
So q # 0, we complete the proof of Theorem 7.1. 


Corollary 7.3 Let 0,,...,0, be any n real numbers, then for any € > 0, there is 
rational number T (1 <i < n) satisfies 


Booz 


q 


ae (117) 
q 
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Proof Any € > 0 given, let N E £, Formula (7.17) can be derived directly from 
Theorem 7.1. 


7.2 Basic Properties of Lattice 


Lattice is one of the most important concepts in modern cryptography. Most of the 
so-called anti-quantum computing attacks are lattice based cryptosystems. What is 
a lattice? In short, a lattice is a geometry in n-dimensional Euclidean space R”, for 
example L = Z” C R”, then Z” is a lattice in IR", which is called an integer lattice 
or a trivial lattice. If Z" is rotated once, we get the concept of a general lattice in 
R”, which is a geometric description of a lattice, next, we give an algebraic precise 
definition of a lattice. 


Definition 7.3 Let L C R" be a nonempty subset, which is called a lattice in IR", if 


(i) L is an additive subgroup of R"; 
(ii) There is a positive constant à = A(L) > 0, such that 


min{|x||x € L, x #0} =A, (718) 


à = A(L) is called the minimal distance of a lattice L. 


By Definition 7.3, a lattice is simply a discrete additive subgroup in IR", in which 
the minimum distance A = A(L) is the most important mathematical quantity of the 
lattice. Obviously, we have 


à = min{|x — ylx eL, y eL,x Æ y}, (7.19) 


Equation (7.19) shows the reason why A is called the minimal distance of a lattice. 
If x € L and |x| =A, x is called the shortest vector of L. 

In order to obtain a more explicit and concise mathematical expression of any 
lattice, we can regard an additive subgroup as a Z-module. First, we prove that any 
lattice is a finitely generated Z-module. 


Lemma 7.11 Let L C IR" be a lattice and (a, a2, ..., &m} C L bea set of vectors 
in L, then (o1, &2, . . . , mn} is linearly independent in R if and only if (o, o2, .. . , &m} 
is linearly independent in Z. 


Proof If (01,05, ..., @m} is linearly independent in R, it is obviously linearly inde- 
pendent in Z. conversely, if (o, @2,..., ®m} is linearly independent in Z, that is, any 
linear combination 

d10 +--+ + 4nty = 0, a; € Z, 


wehavea, = a» = --- = ay = O, then the linear combination in R is equal to 0, that 
is 
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010, + 05015 + --- 4-040, —0, 06; ER. (7.20) 
We prove 0; = 0; =--- = Om = 0. By Lemma 7.1, for sufficiently large N > 1, 
there are positive integers q # 0 and pi, po, ..., Pm such that 


là — pil < N^», 1i <m; 
q <N. 


By (7.20), we have 


| pio + nid + PmGm| = |(g0€i = pı)&ı + i + (qOm = Pm)&m| 
1 
< N= (jay ea ln) 


—l 
-N' v max |a;|. 
]xixm 


Let A be the minimal distance of L and € > 0 be a sufficiently small positive number, 
we choose 


m 
EP lo';| 
N -max4e , max ; 
l<i<m A" 


1 
then N » < e, and 


1 
N v max |aj| <A. 
l<i<m 


Thus 
| pic) Tec PnOnl <i. 


Notice that pio +---+ paso, € L, so pio +--+ + pao = 0. Since (0,05, 


..., Œm} is linearly independent on Z, pj = p2 = ++ + = Pm = 0 is derived. For any 
i, 1 <i <m, we get |6;| x |q6;| < N^» < e. Since e is any small positive number, 
there is 0; = 05; = --- = Om = O. This proves that (0, 05, ..., @»} is also linearly 


independent in IR. Lemma 7.11 holds. 


From the above lemma, any lattice L in R” is a finitely generated Z-module. 
Let (B1, B5, ..., Bm} C L bea set of Z-bases in L, then L as the rank of Z-module 
satisfies 

rank(L) =m <n, (7.21) 


and 


L= [Sana ez}. (7.22) 


i=l 


If (£1, B2. ..., Bm} is a Z-basis of L and each £; is regarded as a column vector, 
then the matrix 


266 7 Lattice-Based Cryptography 
B = (fi. B2, ---, Bm] € R"*", rank(B) =m. 
Equation (7.22) can be written as 
L = L(B) = {Bx|x eZ") CR". (7.23) 


We take L as the Z-modules, m as the rank of lattice L, B € IR"*" as the generating 
matrix of lattice L, and (8, 85, ..., Bm} as a set of generating bases of L. 

If (0,02, ..., 04) C IR" is any m column vectors in IR", the Gram matrix of 
A = [21, 05, ..., Am] € R"*", (9,05, ..., Am} is defined as 


T= ((a;, 0 j))mxm- 
Obviously, we have T — A'A, where A' is the transpose matrix of A. 


Lemma 7.12 Let A € R"*", b e R” (m < n is not required), then 


(i) Let x9 € R” be a solution of A' Ax = A'b, then 


|Axo — b|? = min |Ax — b}?. 
xem" 


(ii) rank(A'A) = rank(A), and homogeneous linear equations Ax = 0 and A’ Ax = 
0 have the same solution. 
(ii) A'Ax = A'b always has a solution x € R”, and whenrank(A) = m, the solution 
is unique 
x = (A'A) | A'b. 


Proof First we prove (i). Let x9 € R” satisfies A’ Axo = A'b, then for any x € R”, 
we have 
Ax — b = (Axo — b) + A(x — xo) = y + yı E€ R”. 


We prove that y and yı are two orthogonal vectors in R”. Because 


(A(x — xo)) (Axo — b) 
= (x — xo) A' (Axo — b) 
= (x — xo) (Al Ax — A'b) = 0. 


So y Ly, by Pythagorean theorem, we have 
|Ax — b? = |Axo — bI + |A(x — x0)I > |Axo — bl’. 
So (i) holds. 
To prove (ii), let V4 be the solution space of Ax = 0 and Vx the solution space of 


A’Ax = 0, let’s prove V4 = Vaya. First, there is V4 C Vara. Conversely, letx € Vara, 
that is A’Ax = 0, then 
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x A'Ax = 0 => (Ax)'Ax = (Ax, Ax) = 0. 


The above formula holds if and only if Ax = 0, so x € V4. There is V4 = Vag. 
Notice that 
dim V4 — m — rank(A) 


dim Vya = m — rank(A'A). 


So rank(A) — rank(A'A), (ii) holds. To prove (iii), b € IR" given, then the rank of 
the augmented matrix of linear equation system A'Ax — A'b is 


rank[A'A, A'b] = rank(A'[A, b]) 
< rank(A’) = rank(A) = rank(A'A). 


Therefore, the augmented matrix and the coefficient matrix have the same rank, so 
the linear equations have solutions. When rank(A) = m, then rank(A'A) = m, that 
is, A'A is a reversible m-order square matrix, thus 


x = (AA)! - A'b, — the solution is unique. 


Lemma 7.12 holds! 


Lemma 7.13 A € R"*", and rank(A) = m, then A'A is a positive definite real 
symmetric matrix of order m, so there is a real orthogonal matrix P € R"*" of 
order m satisfies 


P'AAP =|" (7.24) 


0 Ôm 


where ô; > 0 is the m eigenvalues of A'A. 


Proof rank(A) = m — m < n. Let T = A'A, then T is a symmetric matrix of order 
m. Let x € R” be m arguments, quadratic form 


x'Tx = x'A' Ax = (Ax (Ax) = (Ax, Ax) > 0. 
Because rank(A) = m, the above formula if and only if when x = 0, x'Tx = 0. So 
T is a positive definite matrix. From the knowledge of linear algebra, there is an 
orthogonal matrix of order m, P = P'T P is a diagonal matrix, that is 


P'T P = diag{5,, 8», ..., m}. 


Because P'T P and T have the same eigenvalue, 6), 62,..., Ôm is the eigenvalue of 
T , and V 6; > 0. The Lemma holds. 
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Lemma 7.12 is called the least square method in linear algebra, its significance 
is to find a vector xo with the shortest length in the set {Ax — b|x € IR") for a given 
n x m-order matrix A and a given vector b € IR". Lemma 7.12 gives an effective 
algorithm, that is, to solve the linear equations A'Ax — A'b, and xo is the solution 
of the equations, Lemma 7.13 is called the diagonalization of quadratic form. Now, 
the main results are as follows: 


Theorem 7.2 Let L C IR" be a lattice, rank(L) = m (m < n), if and only if there is 
a real matrix B € IR"*" of order n x m, rank(B) = m, such that 


L = {Bx|x eZ") = [Soa ez}. (7.25) 
i=l 
where B = [fii, B5, ..., Bm], each B; € IR" is a column vector. 


Proof Equation (7.23) proves the necessity of the condition, and we only prove the 
sufficiency of the condition. If a subset L in R” is given by Eq. (7.25), itis obvious that 
Lis an additive subgroup of R”, because anya = Bx, B = Bx2, where x1, x2 E€ Z”, 
then x = xı — x2 € Z”, and 


a—p-—B(x—x)-— Bx E L. 

So we only prove the discreteness of L. Let T = B’B, then from Lemma 7.13, T is 
a positive definite real symmetric matrix, let 01, ô2, . . . , Ôm be the eigenvalue of T, 
then 

ô = min{ô;, d2,...,5,} > 0. 
We prove 
min |Bx| > vô > 0. (7.26) 
E 
By Lemma 7.13, there is an orthogonal matrix P of order m such that 

P'T P = diag(01,05, ..., 84). 
For any given x € Z”, x z 0. We have 

|Bx|? = x'Tx = x' P(P'T P)P'x > 8|P'xp = o|xp. 

Because x Æ 0, then |x|? > 1,so0 


|Bx|?>6, Vx eZ", x #0. 


This shows that the distance between any two different points in L is > à > 0. 
Therefore, in a sphere with 0 as the center and r as the radius, the number of points 
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in L is finite. In these finite vectors, there is ay € L, > 
la| = min |x| 2A > à > 0. 
xeL 
x40 
According to the definition of lattice, L is a lattice in IR", the Lemma holds. 
It can be directly deduced from the above theorem 


Corollary 7.4 Let L = L(B) C R” bea lattice of rank(L) = m, X be the minimum 
distance of L, B € IR"*". 5 be the minimum eigenvalue of B' B, then X > A/8. 


Definition 7.4 L C R” is a lattice, and rank(L) = n, call L is a full rank lattice of 
R”. 


By Theorem 7.2, a sufficient and necessary condition for a full rank lattice with 
L as R” is the existence of a reversible square matrix B € R”*”, det(B) z 0, such 
that 


L=L(B)= Yala ez <in| = (Bx|x e Z^). (7.27) 
i=l 
If L = L(B) is a full rank lattice, define d = d(L) as 

d = d (L) = |det(B)|, (7.28) 


call d is the determinant of L. d = d(L) is the second most important mathematical 
quantity of a lattice. The lattice we discuss below is always assumed to be a full rank 
lattice. 

For a lattice (full rank lattice), the generating matrix is not unique, but d = d(L) 
is unique. To prove this, first define the so-called unimodular matrix. Define 


SL, (Z) = (A = Gij)nxnlaij € Z, det(A) = +1}, (7.29) 


Obviously, SL,(Z) forms a group under the multiplication of the matrix, because 
the n-order identity matrix /, € SL,(Z),and A; € SL,(Z), A» € SL,(Z), then A = 
A14» € SL, (Z). Specially, if A € SL, (Z),A = (aij)nxn, then the inverse matrix of 
A 


* * * 
aii 0d] 0, 
* * * 
E ax, ax, +++ a 
A =|? | € SL,(Z), 
* * 
ant Mc i a aca ann 


where aij is the algebraic cofactor of a;;. 


Lemma 7.14 L = L(B) C R’ is a lattice (full rank lattice), Bj € IR"*", then L = 
L(B) = L(B)) ifand only if there is a unimodular matrix U € SL,(Z) => B = BU. 
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Proof If B = B\U,U € SL,(Z), we prove L(B) = L(B,). Leta = Bjx € L(Bi), 
where x € Z”, then 
a = Byx = B,UU'x = BU !x. 


Because of U~'x € Z", thena € L(B), that is L(Bj) C L(B). Similarly, ifo = Bx, 
x € Z^, then 
a = Bx = B(Ux, where Ux e Z". 
Thus, a € L(Bj), thatis L(B) = L(B)). 
Conversely, if L(B) = L(B;), let B = [£81, Bo, .... Bn], Bi = [a1, 02. .... dn], 


transition matrix 


(Bi, B2, ++, Bn) = (01,05, ..., 0,)U. 


Obviously by 8; € L(Bji)(1 <i < n), U is an integer matrix. and 


(a, o», ... T7 = (Bi, Po, E Bn)U}. 


Because o; € L(B)(1 x i < n), Uj also is an integer matrix. Because of 


(Bi, B2, .... Bn) = (01,02, ..., 04)U = (Bi, Bo, ..., By )UU. 


We have UU = I,,thus det(U) = +1, that is U € SL,(Z), B = BU, the Lemma 
holds. 


By Lemma 7.14, B, B, are any two generating matrices of a lattice L, then 
|det(B)| = |det(B,)| = d = d (L). 


That is, the determinant d(L) of a lattice is an invariant. 
For a lattice (full rank lattice) L C IR^, the dual lattice of L is defined as 


L* = {a € R"|\la, B) eZ, V B € L}. (7.30) 
Lemma 7.15 Let L = L(B) be a lattice, then the dual lattice of L is L* = 


L((B~!)’), that is, if B is the generating matrix of L, then (B~') is the generat- 
ing matrix of L*. 


Proof Let 
L((B^)) = {(B™)'yly e Z^). 


any a € L((B-)),a (By y € Z”, B € L, B = Bx, x € Z", then 
(a, B) = a'b = y'B !|Bx 2 yx € Z. 


That means L((8-!)) C L*. Conversely, any à € L*, forall B € L, there is (a, B) € 
Z. Solet B = [fi, B», ..., By], then 
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n n 
(e 2 = $ xla, B) € Z, V x; € Z, 
i=l i=l 


therefore, for each generating vector 6; (1 < i < n), there is (a, B;) € Z. Write o = 
yi, X2, ee ey Yn). 


MI XI 
(a, Bi) EZ, => B'| : |=]: | eZ". 
Yn Xn 
Thus 
yı XI 
e (B)! 
Yn Xn 


That is a € L((B')-!). Because B. B^! = I,, => (B!) B'— IL, thus (DB!) = 
(B')-!. So a € L((B-!)), that is L* C L((B-!)). We have L* = L((B-!)). The 
Lemma holds. 


By Lemma 7.15, we immediately have the following corollary. 


Corollary 7.5 Let L — L(B) be a full rank lattice, L* is the dual lattice of L, then 
d(L*) = d '(L). 


An equivalence relation in R” can be defined by using a lattice L, for allo, B € IR", 
we define 
a = (modL) > a — f E L. 


Obviously, this is an equivalent relation, called the congruence relation of mod L. 


Definition 7.5 Let F C R” be a subset, and call F the basic region of a lattice (full 
rank lattice) L, if 


(i) Vx € R”, thereisa« € F > x =a(mod L), 
Gi) Any &1, œ2 € F, then o, Æ o» (mod L). 


By definition, the basic neighborhood of a lattice is the representative element 
set of the additive quotient group R” /L. Therefore, a basic neighborhood of any L 
forms an additive group under mod L. 


Lemma 7.16 Let L = L(B) be a full rank lattice, then 


(i) Any two basic neighborhoods F, and F> of L are isomorphic additive groups 
(mod L). 

(ii) F —(Bx|x = (xi, Xo,..., Xn), and 0 < x; < 1,1 <i <n} is a basic neigh- 
borhood of L(B). 
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(iii) Vol(F) = d = d(L). 


Proof (i) is trivial, because 


F =R"/L, F> = R/L, = F, = h. 


To prove (ii), let B = [£1, B5, ..., Bnl, then (£1, B2, ..., Bn} is a set of bases of IR", 
Va € R”, a can be expressed as a linear combination of £1, £5, ..., Bn, that is 


a= X aibi, V ai ER. 
i=1 
Let [o]p = > ala. {a} = a — [a]z, then {a}, can be expressed as 
XI 
X2 
{a}g =B] . | ,whereO<x, <1, l<i<n. 


Xn 


That is {a}, € F. Because a — {a}z = [a]z € L, so for any a € R”, there is a 
{a} € F, such that 
a = {æ }g (mod L). 


And two points a = Bx and B = By in F, then 
æ — B = B(x — y) = Bz. 


where z = (zi, zo, .... Zn), |lzi| < 1. So œ Æ (mod L), if a Z p. So F is a basic 
neighborhood of L. 

Let’s prove (iii). Because all basic neighborhoods of L are isomorphic, they have 
the same volume, (iii) gives a specific basic region F of L, so we can only prove 
Vol(F) = d = d (L). Obviously, 


Vol(F) = | desc 
Y=(V1; Y2; IEF 


make variable substitution Bx = y and calculate the Jacobi of the vector value 


dyidyz - +- dy, = d (À)dxı - - - dx,. 


7.2 Basic Properties of Lattice 273 
Thus 


vor) = f- f 40025 as, = a. 


0 0 
We have completed the proof of Lemma 7.16. 


Next, we discuss the gram Schmidt orthogonalization algorithm. If B = [6 1, Bo, 
., Ên] is the generation matrix of L, (£1, B5, ..., Bn} can be transformed into a set 
of orthogonal bases (Bf, 85, ..., BF}, where fr = B1, and 


(Bi. B 


= a-Yus UD B* (7.31) 


{B7, B3, .... Br} is called the orthogonal basis corresponding to (P1, B2,..., Bn}. 


= [B87,..., B*] is the orthogonal matrix corresponding to B. For any 1 <i <n, 
1 n g P g y 
denote 
uii = l,ujj = 0, when j > i. 
NA 
uij = S ) wien l<j<i<n. (7.32) 
j 
U = (uij)nxn- 


Then U is a lower triangular matrix, and 


Bi [n 
Bo Bs 

; =U [p] (7.33) 
Bn Hi 


If both sides are transposed at the same time, there is 


(Bis Bo... Bn) = (BY, Bo, BU. (7.34) 
Therefore, U' is the transition matrix between two groups of bases. 


Lemma 7.17 Let L = L(B) C R” be a lattice, B = [f1, Bo, ..., By] is the gen- 
erating matrix, B* = [By, B5, ..., B5] is the corresponding orthogonal matrix, 
d — d(L) is the determinant of L, then we have 


d= [ [wt < [ [16 (7.35) 
i=1 i=l 


Proof By (7.24), we have B = B*U, because det(U) = 1, so 
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det(B) = det(B*). 


By the definition, 
d? = det(B' B) = det(U(B*) B*U’) 
= det((B*)' B*) 
= det(diag(|Br ^, 18317, ..., IB? p. 
So there is 


d = | [ii 
i-l 


In order to prove the inequality on the right of Eq. (7.35), we only prove 
IB SIBI, Esism. (7.36) 


Because f; = Im uij B5, then 


IB; = (bi, Bi) = (Dus Sw] 
= 


j=l 


i-1 
= (BY, BD Y uz (By, BD). 


j=l 


Therefore, the inequality on the right of (7.35) holds, the Lemma is proved. 


Equation (7.35) is usually called Hadamard inequality, and we give another proof 
here. 

In order to define the concept of continuous minima on a lattice L, we record the 
minimum distance on L with A;. That is 4; = A(L). Another definition of A, is the 
minimum positive real number r, so that the linear space formed by L N Ball(0, r) 
is a one-dimensional space, where 


Ball(0, r) = (x € R"|x| < r} 


is a closed sphere with 0 as the center and r as the radius. The concept of n continuous 
minima A, A2,...,A, in L can be given. 


Definition 7.6 Let L = L(B) C R” be a full rank lattice, the i-th continuous mini- 
mum A, is defined as 


Aj = A;(L) = inf{r| dim(span(L N Ball(0, r))) > i}. 
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The following lemma is a useful lower bound estimate of the minimum distance 
AL. 


Lemma 7.18 L = L(B) C R” is alettice (full rank lattice), B* = [B;, B5, .... B5] 
is the corresponding orthogonal basis, then 


Ay = AG) > min |671 (7.37) 


Proof For V x € Z", x Æ 0, we prove 


|Bx| > min |B*|, x € Z”, x £0. 
l<i<n 


Let x = (x1, x2, ..., Xn) Æ 0, j be the largest subscript > x; 4 0, then 


(Bx, B*)| = = |x, 118%). 


j 
(Dae 2 
i=l 


Because when i < j, 
(Bi, B7) — 0, and (8j, 87) = (6%, B7). 


On the other hand, 
(Bx, B*)| < IBx118*1. 


So 
|Bx| > |xj|IB3] = min 1871. 
Lemma 7.18 holds! 


Corollary 7.6 The continuous minimum A4, A5, ..., Àn of a lattice L is reachable, 
that is, it exists a; € L > |a;|=A;, 1<i<n. 


Proof The lattice points contained in ball Ball(0,) with center O and radius 
ô (ô > Aj) are finite, because in a bounded region (finite volume), if there are infinite 
lattice points, there must be a convergent subsequence, but the distance between any 
different two points in L is greater than or equal to à}, which indicates that 


IL n Ball(0, 3)| (o0, 8); 


has finite lattice points, it's not hard for us to find a; € L > |æı| 2A, &2 E L > 
|| = Ag,...,|@,| = àn. The Corollary holds. 


In Sect. 7.1, the geometry of numbers is relative to the integer lattice Z"; next, we 
extend the main results to the general full rank lattice. 


276 7 Lattice-Based Cryptography 


Lemma 7.19 (Compare with Lemma 7.5) L = L(B) C R” is a lattice (full rank 
lattice), Z C R”, if VAZ) > d(L), then there are two different points in Z, a € &, 
BEF>a-BPetl. 


Proof Let F be a basic region of L, that is 
F = (Bx|x = (X1,---,%Xn),0< |x; | < 1, 1 <i <n}. 
Obviously, R” can be divided into the following disjoint subsets, 


R” = User(o + yly € F} 
= Uvez {œ + F}. 


For a given lattice point a € L, define 
Ra = ZN fa + F} = + Dy, Da C F. 
Therefore, Z can be divided into the following disjoint subsets, 


R = User Ru, > VOR) = » Vol(Z,) = > Vol(Dy). 


aeL acL 


If for any a, 8 € L, o A B, D, N Dg = Ø, then 
Vol(Z2) = Vol(User Da) < Vol(F) = d(L), 


contradicts assumptions. So it must exist o, 8 € L, o # B, = Da N Dg x Ø. Let 
x € Da N Dg, then æ + x € Z, B +x € Z. And 


(a+x)—-(B+x)=a-fPeL. 


The Lemma holds. 


Lemma 7.20 (Compare with 7.6) Let L be a full rank lattice, 2 C IR" isa symmetric 
convex body. And Vol(#)  2"d(L), then Z contains a nonzero lattice point, that 
isdo € L, o £0, such that a € Z. 


Proof Let 
1 
5% = {x|2x e P}. 


Then i 
Vol (37) =2"Vol(#) > d(L). 


By 7.19, there is x € 146, ye sR, => x — y € L. And because 2x € Z, 2y e 4, 
& is a symmetric convex body, by Lemma 7.4, 
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1 
40x -2»-x-ye&. 


The Lemma holds. 


Corollary 7.7 Let L be a full rank lattice, X(L) = i, is the minimum distance of L. 
Then 


Ar = M(L) € /n(d (D). (7.38) 
Proof First we prove 
Vol(Ball(0, r)) > (=) ] (7.39) 


This is because Ball(0, r) contains the following cubes 


b € R"Ix = (1, ... Xn), V [xi] < =} C Ball(0, r). 


Ja 


By the definition, there are no nonzero lattice points in open ball Ball(0, A), by 
Lemma 7.20, because Ball(0, 41) is a symmetrical convex body, there is 


Vol(Ball(0, à1)) < 2"d(L). 


Thus 


(52 á < 2"d(L) 
vn} ~ l 
That is , 

ài € Vn(d(L))". 
The Corollary holds. 


Combined with Eq. (7.37), we obtain the estimation of the upper and lower bounds 
of the minimum distance of a lattice, 


min |67] <A(L) < Vnd). (1.40) 


Lemma 7.21 Let L C R” be a lattice (full rank lattice), X1, X2, ..., A is the con- 
tinuous minimum of L, d — d(L) is the determinant of L, then 


AyAg...An € n3d(L). (7.41) 


Proof Let {œ1, &2,..., æn} C L, and |o;| = A; is a set of bases of IR". Let 


r- pem (22) |. (7.42) 
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where {a}, 05, ..., 0] is the orthogonal basis corresponding to (o, @2,..., On}. 
Let's prove that T does not contain any nonzero lattice points. Let y € L, y Æ 0, let 
k be the largest subscript so that | y| > A4, then 

y € Span(oj, 05, ..., a2) = Span(aj, o5, ..., Œk). 
Because if y is linearly independent of o, 65, ..., o, then 


k 4- 1 < dim(Span(oi, o5, ..., or, y) N Ball(0, |y])). 


Aki < |y] is obtained from the definition of 4,41, which contradicts the definition 
of k. By y € Span(ai, 05, ..., Œk), 


Therefore y ¢ T, by Lemma 7.20, because T is a symmetric convex body, thus 


Vol(T) < 2"d. 


On the other hand, 
Vol(T) — (i J - Vol(Ball(0, 1)) 
i=l 
n 2 n 
> ài {| — 
H C ) 
So 


Lemma 7.21 holds. 


The above lemma shows that the upper bound (7.38) of A; is valid for ^; in the 
sense of geometric average. 

Finally, we discuss the computational difficulties on the lattice. These problems 
are the main scientific basis and technical support in the design of trap gate function, 
and they are also the cornerstone of the security of lattice cryptography. 


1. Shortest vector problem SVP 


Lattice L is a discrete geometry in R”, we know that its minimum distance A; = 
A(L) is the length of the shortest vector in L. How to find its shortest vector ug € L 
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for any full rank lattice L, => 


[uo] — min |x| = Ay. 
xeL,xz0 


It is the so-called shortest vector calculation problem. At present, there are insur- 
mountable difficulties in theory and calculation, because we only know the existence 
of uo, but we can't calculate uo. Second, the current main research focuses on the 
approximation of the shortest vector. The so-called shortest vector approximation is 
to find a nonzero vector u € L on L, => 


lu < r(n)à;, u E€ L,u £0, 


where r (n) > 1 is called the approximation coefficient, which only depends on the 
dimension of lattice L. 

In 1982, H. W. Lenstra, A. K. Lenstra and L. Lovasz creatively developed a set 
of algorithms in (1982) to effectively solve the approximation problem of the short- 
est vector, which is the famous LLL algorithm in lattice theory. The computational 
complexity of LLL algorithm is polynomial for the whole lattice, and the approxima- 
tion coefficient r(n) — 27. How to improve the approximation coefficient in LLL 
algorithm to the polynomial coefficient of n is the main research topic at present. 
For example, Schnorr's work in 1987 and Gama and Nguyen's work (2008a, 2008b) 
are very representative, but they are still far from the polynomial function, so the 
academic circles generally speculate: 

Conjecture 1: there is no polynomial algorithm that can approximate the shortest 
vector so that the approximation coefficient r (n) is a polynomial function of n. 


2. Closest vector problem CVP 


Let L C R” bea lattice, t € IR" is an arbitrary given vector, and it is easy to prove 
that there is a lattice point u; € L, => 


|u; — t| = min [x — t|, 
xeL 


u, is called the nearest lattice point (vector) of t. When t = 0 is a zero vector, uo 
is the shortest vector of L, so the adjacent vector problem is a general form of the 
shortest vector problem. Similarly, we only know the existence of the adjacent vector 
u;, and there is no definite algorithm to find u; instead of the approximation problem 
of the adjacent vector, x € L, if 


|x — t| € ni(n)|u, — t|, 


then x is called the approximation coefficient, which is the approximation adjacent 
vector of rı (n), in 1986, Babai proposed an effective algorithm to approximate the 
adjacent vector in Babai (1986), and its approximation coefficient rı (n) is generally 
of the same order as the approximation coefficient r (n) of the shortest vector. 
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There are many other difficult computational problems on lattice, such as the 
Successive Shortest vector problem, which is essentially to find a deterministic algo- 
rithm to approximate each a; € L, where |o;| = 4; is the continuous minimum of 
L. However, SVP and CVP are commonly used in lattice cryptosystem design and 
analysis, and most of the research is based on the integer lattice. 


7.3 Integer Lattice and g-Ary Lattice 


Definition 7.7 A full rank lattice L is called an integer lattice, if L C Z”, an integer 
lattice L is called a q-ary lattice, if qZ” C L C Z”, where q > lisa positive integer. 


It is easy to see from the definition that a lattice L — L(B) is an integer lattice 
<> B c Z"*" is an integer square matrix, so the determinant d = d(L) of an entire 
lattice L is a positive integer. 


Lemma 7.22 Let L = L(B) C Z” beaninteger lattice, d = d(L) is the determinant 
of L, then dZ" C L C Z^, therefore, an integer lattice is always a d-ary lattice 
(d — q). 


Proof Let a € dZ”, let's prove that œ € L, that is, œ = Bx always has the solution 
of the entire vector x € Z”. Let B^! be the inverse matrix of B, then 


bj, Diy s by, 

am 1 B* = 1 by, b», PIT b5, 
det(B) det(B) alec 5516162 S SP ee ? 
b* br, es DF 


nl nn 


where B = (bij)nxn, b; is the algebraic cofactor of b;;. Because B € Z"*",so B* € 
Z"*" thusdB-! = +B* € Z""", write a = df, then B € Z”, and 


x-2Bla-dB!pz--rB*Be7. 


Thus a € L. That is dZ” C L, the Lemma holds. 


The following lemma is a simple conclusion in algebra. For completeness, we 
prove the following. 


Lemma 7.23 Let L be a q-ary lattice, Z, is the residual class rings mod q, then 


(i) Z"/qZ" = Z} (additive group isomorphism). 
(ii) Z"/L = Zi /1jqz (additive group isomorphism). Therefore, L/qZ" is a linear 
code on Zi. 


Proof a = (a), @,..., an) € Z”, B = (bi, bo, ..., bn) € Z”, if V aj = bi(mod q), 
we write à = f (mod q). For any a € Z^, define 
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à = (d1,d5, ..., d) € Zo, 


where a; is the minimum nonnegative residue of a; mod q, and thus, we have a = 
a (mod q). Define mapping o : Z” 5> Zi as o (œ) = a, this is a surjection, and 


c (a +p) 2 à + B =o(a) + o (8). 


Therefore, ø is a full group homomorphism. Obviously Kero = qZ”, therefore, by 
the isomorphism theorem of groups, we have 


Z" qZ" = Z. 
Because of qZ” C L C Z”, then by the isomorphism theorem of groups, 
Z"/L = Z" /qZ"/L/qZ" X ZilLIqZ. 
The Lemma holds. 


Next, we will prove that Z"/L is a finite group. Therefore, we first discuss the 
elementary transformation of matrix. The so-called elementary transformation of 
matrix refers to elementary row transformation and elementary column transforma- 
tion, specifically refers to the following three kinds of elementary transformations: 


(1) Transform two rows or two columns of matrix A: 


| o;;(A)-Transform rows i and j of A 


tjj (A)- Transform columns i and j of A 
(2) A row or column multiplied by —1 by A: 


o_;(A)-Multiply row i of A by — 1 
t_;(A)-Multiply column i of A by — 1 


(3) Add the k times of a row (column) to another row (column), k € R, in many 
cases, we require k € Z to be an integer: 


| Oxi4+;(A)-Add k times of row i of A to row j 


Ti+; (A)-Add k times of column i of A to column j 


The n-order identity matrix is represented by /,,, the matrix obtained by the above ele- 
mentary transformation of 7, is called elementary matrix. We note that all elementary 
matrices are unimodular matrices (see (7.29)), and 
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oij (A) = oij (I4) A, vij (A) = Atij Un) 
0 .;(A) — o. (l,)A, t; (A) = At; (ln) (7.43) 
Oi j (À) = Gui; Un) A, Tki+j (A) = ATri+ js) 


That is, elementary row transformation for A is equal to multiplying the correspond- 
ing elementary matrix from the left, and elementary column transformation for A is 
equal to multiplying the corresponding elementary matrix from the right. 


Lemma 7.24 Let L — L(B) C Z" bean integer lattice, then Z" / L is a finite group, 
and 


IZ^/L| = d(L). 


Proof According to the knowledge of linear algebra, an integer square matrix B € 
Z" can always be transformed into a lower triangular matrix by elementary row 
transformation; that is, there is a unimodular matrix U € SLn(Z), so that 


* 0.0 
UB= * Ox 0 
* Ox * 


Then the elementary column transformation of U B can always be transformed into 
an upper triangular matrix, so it is a diagonal matrix; that is, there is a unimodular 
matrix U; € SLn(Z), > 


U BU, = diag{d, 52,..., Ôn}. 


where ô; Æ 0, 6; € Z, and 


d(L) = | det(U BU:)| = | [él 


i=l 


Let L(U BU|) be an integral lattice generated by UBU,, we have quotient group 
isomorphism 
Z"/L(U BU,) = 67 4Z/|8j]Z = Q7 Zu. 
Thus 
|Z" /L(U BU)| = | [16:1 = ac. 


i=l 
Because of L(B) = L(BU,) and L(B) = L(U B), Thus L(B) = L(U BU;), so 


|Z" /L(B)| = |Z" /L(U BU))| = d(L). 
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Lemma 7.24 holds. 


An integer square matrix B = (bij)nxn € Z"*" is called Hermite normal form 
matrix, if B is an upper triangular matrix, that is bj; = 0,1 < j <i < n, and 


A Hermite normal form matrix, referred to as HNF matrix. 


Definition 7.8 L = L(B) C Z” is an integer lattice, and B is the HNF matrix, which 
is called the HNF basis of L, denote as B = HNF(L). 


The following lemma proves that a whole lattice has a unique HNF basis, so it is 
reasonable to use HNF(L) to represent HNF basis. 


Lemma 7.25 Let L C Z" be an integer lattice, then there is a unique HNF matrix 
B > L = L(B). 


Proof Let L = L(A), A is the generating matrix of L, by using the elementary 
column transformation, A can be transformed into an upper triangular matrix, that is 


C11 C12 *** Cin 
AU -| |, U estn. 


Ü O vus 


where Cj; > 0,1 € i € n, if AU, is transformed continuously, there is a unimodular 
matrix U2, = AU,U2 = B is the HNF matrix, because L(B) = L(AU,U5), know 
that L has HNF base B. 

Let's prove the uniqueness of HNF base B if there are two HNF matrices 
Bı,B2 > L(B,) = L(B5), then from Lemma 7.14, there is a unimodular matrix 
U € SLn(Z) such that Bj = B2U; that is, the elementary column transformation 
defined by formula (7.43) can be continuously implemented on B» to obtain By, but 
for B», any column transformation r;; ,t_; and Ti+; is nota HNF matrix, so U = I, 
is a unit matrix, that is B; — B5. The Lemma holds. 


Lemma 7.26 Let L = L(B) be an integer lattice, B = (bij)nxn is a HNF matrix, 
B* = [Bt, By, ..., Bx] is the orthogonal basis corresponding to B = [B,, Bo, ..., Bnl, 
then 

Be = [By B, ttt). B;] = diag{b 1, by, .- +, Dan} 


is a diagonal matrix. 


Proof We prove B? = (0,0,..., bj, 0,..., 0), induction of i, when i = 1, py = 
Bi = (b11, 0,..., 0) . The proposition holds, if for j < i, there is B7 = (0,0,..., dj;, 
0,..., 0) holds, then when i 4- 1, by (7.31), there is 
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i * 
(Bi+1, B7) 
* J * 
Bii = Bi — D pe P 
j=1 j 
i b 
j+ 
= Bii pr 5 z B 
j=1 JJ 
biis) bist 0 
bxi+1) bxi+1) 0 
=] ban | — | bay | = 0 
bye) 0 bci) 
0 0 0 


Thus the proposition holds. 


Next, we discuss q-ary lattices, where q > 1 is a positive integer, the following 
two q-ary lattices are often used in lattice cryptosystems. 


Definition 7.9 Let Z; be a residue class ring mod q, A € Z7*", the following two 
q-ary lattices are defined as 


A,(A) = {y € Z" there is x € Z” = y = A'x(mod q)], (7.45) 


and 
A; (A) = {y € Z” |Ay = 0(mod g)}. (7.46) 


By the definition: A,(A) C Z" and AS (A) C Z" is an m-dimensional integer 
lattice. And any a € qZ”, then there is x = 0 € Z”, > a = A'x(mod q), and Aa = 
O(mod q), there is 

| qZ” C Ag(A) c Z" 


m jJ m 
qZ" C AF (A) CZ". 


That is, A, (A) and Ge (A) are g-element lattices of dimension m. 
Lemma 7.27 We have " 
Az (A) = q Aq(A) 
Aq(A) = qAT(CA 
Proof Any a € A,(A)*, by the definition, then 
(v.a) € Z, V y € Aq (A). 


And 
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(y,a) = ya € Z > ya = 0(mod 1). 


There is 
y qa = O(mod q), V y € Ag (A). 
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Because y € A,(A), thus there is x € Z” > y = A x (mod q), from the above for- 


mula, 
x Aqa = 0(modq), V x e Z”. 
Thus 
Aqa = 0(mod q), = qa € A; (A). 
We prove 


qAq(A)* C AZ (A). 


Conversely, if y € AD (A), we have 
1 
Ay = 0(modq) >A (2) = O(mod 1). 
q 
Any a € A, (A), let x € Z”, œ = A x(mod q), then 
1 , 1 
œ, =y} =x A| —y | = 0(mod 1), V x e Z”. 
q q 


We have i 
m € Aj(A)* > y eq ACA). 


That is 
AFA) C q A. 


Thus, Ax (A) = qA,(A)*. Similarly, the second equation can be proved. 
Lemma 7.28 Let q be a prime, A € T ea m > n, and rank(A) = n, then 
| det(A+(A))| = q", 


and 
| det(A,(A))| 2 q^". 


(7.47) 


(7.48) 


Proof In finite field Z,, rank(A) = n, then the linear equation system Ay = 0 has 


exactly g”~” solutions, from which we can get 


|AL(A)/qZ"| = q"7^. 
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By Lemma 7.23, 
IZ" /A5 (A)| = IZ7 / A5 (A)/gZ"| = q". 


By Lemma 7.24, 
| det(Az (A))] = IZ" /A7 (A)| = q". 


So (7.47) holds. By Corollary 7.5 of the previous section, we have 
| det (A7 (A))| = q7" 
By Lemma 7.27, 
| det (A, (A))| = q” | det(A7 (A)*)| = q”™. 


The Lemma holds. 


7.4 Reduced Basis 


In lattice theory, Reduced basis and corresponding LLL algorithm are the most 
important contents, which have an important impact on computational algebra, com- 
putational number theory and other neighborhoods, and are recognized as one of 
the most important computational methods in recent 100 years. In order to introduce 
Reduced basis and LLL algorithm, we recall the gram Schmidt orthogonalization 
process summarized by Eqs. (7.31)-(7.34). Let (£1, B5, ..., Bn} C IR" be a set of 
bases corresponding to IR" , (Bf, 85, ..., Bx} is the corresponding Gram-Schmidt 
orthogonal basis, where 


CL (Bi, B*) 
*— bi, Be =f — T bog Sh, 7.49 
Bt = Bi, =P 2. i9; By <i<n (7.49) 


The above formula can be written as 


Por 


PE? A£ l<i<n. (7.50) 


There is 


Lemma 7.29 Let (fi, Bo, ..., Bn} be a set of bases of R", (Br, B5. ..., Br} is the 
corresponding Gram-Schmidt orthogonal basis, L(f1, B5, ..., By) = Span(fi, Bo, 
., Bg} is a linear subspace extended by B, B5, . .. , By, then 
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(i) 
L(Bi, Ba, .... Bk) = L(By, By... By), 1 Sk <n. (7.51) 


(ii) For 1 <i <n, there is 


(Bi, Bj) = 0, when k > i; 


(7.52) 
(Bi, Be) = (Bg, Bg), when k = i. 
(iii) Vx € R", x = M5 V xi, then 
yis NE izizm (7.53) 


Proof The above three properties can be derived directly from Eq. (7.49) or (7.50). 


Let U = (Uij)nxn, where 


(Bi. B7) 
Ug-. Ue, when j > i. Un = 1. (7.54) 
dE o : 


Therefore, U is the lower triangular matrix with element 1 on the diagonal, and 


Bi i 
Bo 2 
. {| =U) . |. (7.55) 
&| La 
U is called the coefficient matrix when (£i, b2, ..., Bn} is orthogonalized. 


Let's introduce the concept of orthogonal projection: suppose V c Rk c R"(1 < 
k < n), the orthogonal complement space V+ of V in IR* is 


Vt = (x e R'(x,a) 20, Va € V}. (7.56) 
Because R* = V @ Vt, so V x € IR^, the only can be expressed as 
x —a + B, where a € V, B e V+. 


a is called the orthogonal projection of x on subspace V, obviously |x|? = Jo? + 


isl. 


Lemma 7.30 Let (f1, B2, . . . , Bn} be a set of bases of R" and (Bp, B5. ..., Bx} be 
the corresponding orthogonal basis, 1 < k < n, then Bj is the orthogonal projection 
of By on the orthogonal complement space V of the subspace L(Bi, P2, ..., By i) 
of L(B1, fo, -+ -s Br): 
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Proof When k = 1, the proposition is trivial, if k > 1, then by Lemma 7.29, 


L(fi, b2, tt Bia) = L(y, B3. suse s Bi). 


Therefore, the orthogonal complement space V = L(8%) of L(ß1, B2, -.-, By-1) in 
L(fi, B2,.--, Be—-1, Bx) is a one-dimensional space, because of 


k-1 
Br = Be + > ue BF, 
j=l 


and 


k-1 
la. Y) — 0. 


j=l 


So Bf is the orthogonal projection of 6, on V. The Lemma holds. 


Next, we discuss the transformation law of the corresponding orthogonal 
basis when making the elementary column transformation of the base matrix 


[Bi, B2, -s Bu]. 


Lemma 7.31 Let (Bi, P2, ..., Bn} C R” is a set of bases, (Bj, P>, .... Bx} is the 
corresponding orthogonal basis, A = (Uij)nxn is the coefficient matrix. Exchange 
Pk with By to get a set of bases (a, 02, ..., o4] of R”, where 


O4—| = Be, o = By, 0 = Bi, when i Ak — 1, k. 


Let (o, 05, ..., a} be the corresponding orthogonal basis and A, = (Vij)nxn be 
the corresponding coefficient matrix, then we have 
(i) o? = BY, ifi Ak — lk. 
(ii) 
ay i = By + Uek-1 Be 
ay = Pki — Vkk-1 kı- 


(iii) vij = Uj, if 1 < j <i <n, and (i, j} (1 (k,k—1) = Ø. 
(iv) 
C ZU gay 
ik-1 = Uik-1Ukk—|1 + Mikro mp» 54 


Vik = Uik—1 — UikUkk—1» ik. 


(v) vij = ug vgj = uij, LS j«k-l 


Proof If 1 xi «k—l,ork«i <n, then the orthogonal complement space in 
L(y, 0Q2,..., 0/;) = L(Bi, Po, PE Sg Bi), 


V= L+ (a, A2, e, 0j 1) = L* (Bi, b2, sey Bi-1). 
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Therefore, the orthogonal projection of o as a; = D; on V is the same as that of 87 
as B; on V, that is a? = B?(i A k — 1, k), (i) holds. 
To prove (ii), because o , is the orthogonal projection of (= o.) on the 


orthogonal complement space L(Bz ,) of L(Bi, B2, ..., By-2), because of 
k-1 
Bi = Be — > uni 85 
j=l 
k-2 
= Êk — uua, — ». ux; 5, 
j=l 


and L(fi, B2, ..., Be-2) = L (BX, BS, ..., Be—2)*, there is 
o, = Be uae 


Similarly, o? is the orthogonal projection of 8; , on L(oz ), thus 


oy = By 4 — Vik-100 4- 
where " . 
j (Bici ot) 
i=. 
la 4r 
20 (Bi o a Bia) 
log 4l? 
2 
Ukk Rus 
end , 
lez i? 


thus (ii) holds. Similarly, other properties can be proved. Lemma 7.31 holds. 


Lemma 7.32 Let (fi, b2, ..., Bn} be a set of bases of R", (Bj, B5, .... Bj] be the 
corresponding orthogonal basis, and A = (Ujj)nxn be the coefficient matrix. For any 
k > 2, if we replace By with By — rpk—ı and keep the other B; unchanged (i # k), 
we get a new set of bases. 


(01,02, ..., Qn} = {B1, £2... Be—-1, Be — rÊk-1, Pes s Bn}. 


Let {a}, 05, ..., a7} be the corresponding orthogonal basis and A, = (vjj)nxn the 
corresponding coefficient matrix, then we have 
(i) o? = BY, V1 <i <n, that is, B? remains unchanged. 
(ii) vij=ujfl<j<isni¥k. 
(iii) 
Urj = Ukj — FUk-1,js if j «k-—l1 


Ukk-1 = Ukk-1 7r, ifj=k-l1. 
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Proof When i < k, ori > k, a? = B; is trivial, to prove (i), only prove when i = 
k. Because o; is the orthogonal projection of a, = fy — rfi. in the orthogonal 
complement space L(o7) = L(K) of L(Bi, Bo, ..., Be-1) = L(@1, 005, . 01), 


kl 
Bi = Be — > un Bi 
j=l 
k-2 
= Be — rpi — | X uy BF + Quai DEA 
j=l 
k-2 
= oy — | X ug BS + Qua — BLA 
j=l 


This proves that a; = 6;. Thus (i) holds. To prove (ii), when i Æ k, we have 


(uot) — (A BY) 
v= *2 xp - ij 
la’ | 185] 


that is (11) holds. When i = k, 


(o, 05) 
lo? 
s (Br 7 rfi BP) oy cpu) 
lo; |? 
(B Bj) — (i BF) 
r 
1671? |B; |? 


= Ukj — TUk—1j. 


Ukj = 


The above formula holds for all 1 < j < k — 1, thus (iii) holds, the Lemma holds. 


Next, we introduce the concept of a set of Reduced bases of IR". 


Definition 7.10 Let (£1, B2,..., Bn} C R be a set of bases, (fy, B3,..., B] be 

the corresponding orthogonal basis, A = (Ujj)nxn be the coefficient matrix, and 

(B1, Bo, ..., Ba 1] be a set of Reduced bases of IR", if 
. 5 < L Vl<j DD 

PRG LC P. 0.57) 

(i) [BF — ukk- il = GIB Vl <isn. 

A set of Reduced bases of IR" is sometimes called Lovisz Reduced bases, which is 

of great significance in lattice theory. The important result of this section is that any 
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lattice L in R” has Reduced bases, and the method to calculate the Reduced bases is 
the famous LLL algorithm. 


Theorem 7.3 Let L C R” be a lattice(full rank lattice), then there is a generating 
matrix B = [fi, p2, ..., Bn] of L, where (B1, P2, ..., Bn} is a Reduced basis of IR" 
and will also be a Reduced basis of lattice L — L(B). 


Proof Let B = [£1, B5, ..., Bn], L = L(B), first we prove 
1 
|ui] < ru Lek, (7.58) 


If there is ak > 1, then the above formula does not hold, let r be the nearest integer 
of ukk—1, obviously, 


Nie 


jukk-ı — r| < 


In (£i, Bo,..., Bn}, replace By with By — rBiy 1, thus by Lemma 7.32, 
Ukj > Ukj — TUk—1j,s 1< j <k. 


Specially, when j = k — 1, 
Ukk-1 > Ukk-1 — T, 


under the new basis, all B; and u;;(1 < j < i # k) remain unchanged, so Eq. (7.58) 


holds under the new basis. 
In the second step of LLL algorithm, we prove that 


3 
[Bz — urbal? m LIBI V1 <k <n. (7.59) 


By (7.4), > : 
[Bg + ukk-1 Bel” = [Be — uai. 


Therefore, the sign in the absolute value on the right of Eq. (7.59) can be changed 
arbitrarily. If there is ak, 1 < k < n such that (7.59) does not hold, that is 


3 
Be + uaib? < geal. (7.60) 


In this case, if 8, and B, are exchanged and the other £8; remains unchanged, 
there is a new set of bases {a ,@,...,@,}, the corresponding orthogonal basis 
(o5, 05, ..., 05] and the coefficient matrix A, = (vij)nxn, where 


a; = BiG Ak—1,k), r-i = Pr o = Pri 


Let's prove that under the new base (o, a@2,..., Œn}, there is 
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* * 2 3 * 42 
log + viai i = leal. (7.61) 


by Lemma 7.31, 
eu, = By + ukk- Bey 


a% = By — Vkk-1 bk1- 


By (7.60), we have 


3 
2 2 
lo i < ris + Vkk-1Œ%_ıl E 


That is 


4 3 
2 2 2 
[A + Ut—105. i] > gle > zlil E 


Thus (7.61) holds. Using the above method continuously, it can be proved that formula 
(7.59) is valid for V k > 1, however, when k is replaced by k — 1, the new B; , is 
replaced by 


Bj 4 Ba + uai a = Bia 


We have to prove (7.59), it remains unchanged when k — 1 is used instead of k. In 
fact, 
[Bj Fai Ba = [Be + aca (Ba caa Bl 


2 2 
= |B% + uci aM duci iki ol 


3 
> PAL + Up, altuaBt ap 


3 * 
> ABI + ha-u-2Bi al 
3 * * |2 
= 4 Pa + uk-ik-2 B 5| 
3 ad 
= lal’ 
Therefore, Eq. (7.59) does not change when the transformation of commutative vector 
is carried out continuously; that is, Eq. (7.59) holds for all k, 1 < k <n. 
The third step of the LLL algorithm, let's prove that 


1 
gl S5, VIsj«kzn. (7.62) 


When j = k — 1, (7.58) is the (7.62). For given k, 1 < k < n, if (7.62) does not hold, 
let / be the largest subscript > |ux| > L, Let r be the nearest integer to ug, then 
lug —r| < L. Replace f, with Bj — rf), from Lemma 7.32, all 87 remain unchanged 


and the coefficient matrix is changed to: 
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uyj = Ukj —TUyj,1<j <1 


UR, = URI — r. 


While the other u;; remains unchanged, at this time, 


1 
uk — r| = ul €. 
luu — r| = lvl < 5 


So we have Eq. (7.62) for all 1 € j <k <n. 

The above matrix transformation is equivalent to multiplying a unimodular matrix 
from the right, so the Reduced basis B > L = L(B) of lattice L is finally obtained. 
We complete the proof of Theorem 7.3. 


Lemma 7.33 Let L = L(B) be a lattice, B is a Reduced basis of L, and B* = 
[By, 85, .... Bs] is the corresponding orthogonal basis, then for any 1 < j <i <n, 
we have 

I8;P x 2° BF /. 


Proof Because B = [fi, b2, ..., Bn] is a Reduced basis, then 
* * 2 3 * 2 
IB, F ukk—ı kıl = 4 Pil . 


Thus 3 
[Bg + waiBi a = BEP + mies bia x ri S 


There is 3 
2 2 2 2 
Bgl" = 4 Iii — uj AME 


IV 


3 1 

|e? = Aca 
1 * 

= zial. 


So when 1 < j < i < n given, we have 


1 
IB? ? = PUR 


1 
* 2 
2 z Bi-il 
> 7 
egt 


thus 
2 i j 2 
Il? « 27^ Be. 
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Remark 7.3 In the definition of Reduced base, the coefficient 3 on the left of the 
second inequality of (7.57) can be replaced by any ô, where 1 < ô < 1. Specially, 
Babai pointed out in (1986) that the second inequality of Eq. (7.57) can be replaced 
by the following weaker inequality, 


1 
|B | < z Pial: (7.63) 


Let’s discuss the computational complexity of the LLL algorithm. Let B = 
(B1. £2, ..., By} be any set of bases, for any 0 < k < n, we define 


do = 1, d, = det((Bi, Pj)kxk)- (7.64) 


If (Br, B3,..., Br} is the orthogonal basis corresponding to (£i, B2,..., Bn}, there 
is obviously 


k 
d, = | [187.0 <k <n. (7.65) 


i=l 
Thus, d; is a positive number, and d, = d (L)?. Let 
n-l 


D=] [a (7.66) 
k=1 


We first prove that d; (0 < k < n) and D have lower bounds. 


Lemma 7.34 Let 
m(L) = A(LY = min{|x|? : x € L, x z 0). 


Then 


k(k—1) 
2 


3 k 
dy > à m(L),lzxkczn. 


Proof The determinant of k-dimensional lattice Lg = L(fi, Bo, ..., Be) C REA < 
k < n) has 
d*(Lx) = dy. 


By the conclusion of Cassels (1971), there is a nonzero lattice point x in Lg, which 


satisfies x € Ly, x #0, and 
4N? 1 
xP < (5) d (7.67) 
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Then 


The Lemma holds. 


Another important conclusion of this section is that for the integer lattice L esti- 
mation, the computational complexity of the Reduced basis of the integer lattice is 
obtained by using the LLL algorithm. We prove that the LLL algorithm on the integer 
lattice is polynomial. 


Theorem 7.4 Let L = L(B) C Z” be an integer lattice, B = [B1, B2, ..., Bn] is the 
generating matrix, suppose N satisfies 


max |; < N. 
]xixn 


Then the computational complexity of the Reduced basis of L obtained by B using 
the LLL algorithm is 


Time(LLL algorithm) = O (n* log N). 


The binary digits of all integers in the LLL algorithm are O (nlog N), so the compu- 
tational complexity of the LLL algorithm on the integer lattice is polynomial. 


Proof By (7.36), we have 
lS lil, 1 sien. 


where (Bf, 65,..., B5] is the orthogonal basis corresponding to (£i, B2,..., Bn}, 
then by (7.65) and (7.66), we have 


k k 
d, = | [187P < J [I4 < N*t,1 <k <n. 
isl i=1 
And 


n(n—1l) 
2 


1x D-zN (7.68) 
The inequality on the left of the above formula is because of d, € Z, and d, > 1, by 
(7.66), then D > 1. Therefore, O(n) arithmetic operations are required in the first 
step of the LLL algorithm, O (n?) arithmetic operations are required in the second 
and third steps, and the number of bit operations per algorithm operation is < Time 
(calculate D), thus 


Time(LLL algorithm) < O (n?)Time(calculate D) = O (nf log N). 
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Therefore, the first conclusion of Theorem 7.4 is proved. The second conclusion is 
more complex, we will omit it. Interested readers can refer to the original (1982) of 
A. K. Lenstra, H. W. Lenstra and L. Lovasz. 


7.5 Approximation of SVP and CVP 


The most important application of lattice Reduced basis and LLL algorithm is to 
provide approximation algorithms for the shortest vector problem and the shortest 
adjacent vector problem, and obtain some approximate results. Firstly, we prove the 
following Lemma. 


Lemma 7.35 Let(fi, b2, ..., Bn} bea Reduced basis ofa lattice L, (By , B5, .... Bx} 
be the corresponding orthogonal basis, and d (L) be the determinant of L, then we 
have 
(i) 

d(L) < [ [lb < 275 a0». (7.69) 


i=l 


(ii) 
n-l 1 
[Bi] <2 * d(L)". (7.70) 
Proof The inequality on the left of (1), called Hadamard inequality, has been given 


by Lemma 7.17. The inequality on the right of (i) gives an upper bound of [ [7 , 16:1 
by Lemma 7.33, 


Il 2*|gisj«izn. (7.71) 
Thus = 
Bi = BY wifi. 
j=l 
We get 


i-1 
Bi? = [BFP + So uP BFP 


j=l 


i—l 
1 
*|2 «12 
BrP + AE 
j=l 


|^ 


IA 


i-1 
I i-j *|2 
i2 | pe (7.72) 
J= 
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= + qe |B; | 


ze. 
There is 
[ we? < [ [2^ 
i=l i=l 
EE yer I] TOME 
i=] 
= 2297 T1677? 
i=l 
= 226 (4 (L)}?. 
So 


Į [16 < 28° Pa). 
pl 


We have (7.69) holds. To prove (iii), by (7.72) and (7.71), then 
Urs! zoo IB eo T (7.73) 
For all 1 € j <i < n, especially, 


I8] < 2716,1 <i<n. 


Thus 
Ii P" x 2767» Tier? 
i=l 
—230-P(q(L)yy. 

So » ! 

[Bil < 27 d(L). 
Lemma 7.35 holds! 

The following theorem shows that if (£1, 62,..., Bn} is a set of Reduced bases 


of a lattice L, then f, is the approximation vector of the shortest vector uo of lattice 
L, and the approximation coefficient r, = 2”~!. 


Theorem 7.5 Let L = L(B) C R” be a lattice (full rank lattice), B = [B1, Bo, 
..., Bn] is a set of Reduced bases of L, X1 = X(L) is the minimal distance of L, 
then 
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Bil 22 yy 252720). (7.74) 

Proof We only prove that for V x € L, x Æ 0, there is 
IP e2"7!x vxeL,xzo. (7.75) 


When x € L, x Æ 0 given, let 
x= So ribi =) nBhn ED FE ER, l<i<n. 
i-l i=l 
Let k be the largest subscript > rg Æ 0, thus rg = Ti: So 
Ixl? > rglBEP > BEP = 278. (7.76) 


Thus 
Ifi < 2%" Ix, « 277!Ixf x e L, x £0. 


That is (7.75) holds, thus Theorem 7.5 holds. 


The following results show that not only the shortest vector, the whole Reduced 
basis vector is the approximation vector of the Successive Shortest vector of the 
lattice. 


Lemma 7.36 Let L C R” be a lattice, (Bi, Bo, ..., Bn} is a Reduced base of L, let 
[xi, X2, ..., X1} C L bet linearly independent lattice points, then 


IB; x 2^ maxtbal?, Ix... elh (7.77) 
For all 1l < j < t holds. 
Proof Write 
gy nac Sie Teen 
i-l 
For fixed j, let i(j) the largest positive integer i > r;; 4 0, by (7.76), we have 


2 2 . 
[xj] = IBicjy| lajar 


Change the order of x; to ensure i (1) € i(2) < --- < i(t), then j x i(j), for Y 1 < 
j < t holds. Otherwise, the assumption that 


{x1, x2, san Xn} & L(fi, p», s Bj-1) 
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is linearly independent of x1, x2,..., x; is contradictory. Thus j < i(j). By (7.73) 
of Lemma 7.35, then 
Pir epu 
gat Par 


Soe vise 
Thus (7.77) holds, the Lemma holds. 


Remark 7.4 We give a proof of rą = hy in Theorem 7.5, because k is the largest 
subscript > rj Æ 0, so 


By (7.52) and (7.53), 
(0 (x, Be) (x, Be) 
ry = x27 rk = — 
| B; | (Bx, By ) 
Because (Bx, Br) = (Br. By), so 
r(x, BE) 
Fk = ze Fk. 
(Bi. Be) 

In order to discuss the approximation of the Successive Shortest vector of a lattice, 
let's look at the definitions of the continuous minimum A, A2,..., An and the Suc- 
cessive Shortest vector of a lattice, by Definition 7.6 and Corollary 7.6 in Sect. 7.2, 
the continuous minimum A1, À5, ..., A, of a full rank lattice is reachable, for all 
1 <i <n, there is 

la;|=A;, a EL, 1 <i<n. 
For a Successive Shortest vector called o, @2,..., Œn, |o;| is the shortest under the 
condition that o; is linearly independent of (o, a2, ..., a@j—1}. 


Theorem 7.6 Let(fi, Bo, ..., n} bea Reduced basis of lattice L,andk,,h2,...,An 
be the continuous minimum of L, then we have 


BP < 2ta, <i <n. (7.78) 
Proof We make an induction of i. Because (f, £5, ..., Bj] is an Reduced basis of 
lattice L; in Rİ, the proposition is obviously true when i = 1 (see Theorem 7.5 ). If 
the proposition holds for i — 1, then by Lemma 7.36, 
IB < 277! max{Ay, Àz, -p Ai} = 271A. 


Therefore, (7.78) holds for all i. The Theorem holds. 
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Next, we choose the Reduced basis to solve the shortest adjacent vector problem 
(CVP). For any given t € R”, because there are only finite lattice points in one lattice 
L in the Ball(t, r) with t as the center and r as the radius, there is a lattice point u, 
closest to f, that is 

ju, —t| = min |x — t]. (7.79) 
xeL,xzt 


We use the Reduced basis to find a lattice point o € L — 
|o — t| < ri()|u; — t], (7.80) 


w is called an approximation of the nearest lattice point u;, and r,(7) is called an 
approximation coefficient. According to Babai (1986), to solve the approximation 
of the nearest lattice point u;, we adopt the following two technical means: 


(A) rounding off: V x € R”, [61, B5,..., Bn] = B is a Reduced base of lattice L. 
The discard vector [x], of x is defined as follows, let 


n 
x= > sf eR, 


i=1 


Let ô; be the nearest integer to x;, then define 
[xls = oif. (7.81) 
i=l 
[x]z is called the discard vector of x under base B, write x = [x]g + (x] s, then 


n 1 1 
ixia € Yatl-zsaspisisal, 


(B) Adjacent plane 


Let U = pus RB; = L(fi, B2, ..., B4 1) C R” beann — 1-dimensional subspace, 
L = 7 4Zfi; C L bea sublattice of L, and v € L, call U + v is an affine plane of 
R”. When x € IR" given, if the distance between x and U + v is the smallest, U + v 
is called the nearest affine plane of x. 

Let x' be the orthogonal projection of x in the nearest affine plane U + v, let 
y € L be the vector closest to x — v in L', and let w = y + v be the approximation 
of the vector closest to x in L. 

Let L(£i, B2, ..., Bn) C IR" be a lattice, (85, 85, ..., Br} is the corresponding 
orthogonal basis. V x € IR", write x — Sd x; B? , x; € IR", 6; represents the nearest 
integer of x;, according to the nearest plane method, we take (see Lemma 7.43 below). 
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U = L(Bt. B... B 4) = L(Bi, Bos... Bui) 


r = bn Bn € L 
n—1 


x = Y xt + bn Be (7.82) 


i=1 

n-l 
y is a sublattice The grid point closest to x — vin L = Y; Zi 
i-l 


wo=ytuv 


We prove that 
Theorem 7.7 Let L = L(B) C R” be a lattice, B = [f1, B2, ..., Bn] is a Reduced 
base of L, for V x € R" given, the adjacent plane method produces a lattice point 
w = y + v adjacent to x in L (by (7.82)), satisfies 

|w — x| < 22|u, — xl, (1.83) 
where ux is given by Eq. (7.79) and further 

Ix — œ| < 25 fr]. (7.84) 
Proof If n = 1, then B = 0 € R, 0 40. Let x € R, x = x10, L = n0, then when 
n € Z, 
|x — n6| = |xi8 — n0| = |x, — nll] > [xı — óll8l, 
where ô is the nearest integer to x;, let œ = 60, then 
|x — e| = |x; — óJ[0| x |x — n0], V n E€ Z. 

So o = 60 is the lattice point closest to x in L, sow = ux € L, that is 


|x — e| = |ux — x]. 


Thus (7.83) holds. 
Let n > 2, we observe (see (7.82)), v = 5: Bn, x = pao xiB; + 8, B;, then 


; 1 
|x — x | = Ix. — & IB; < 5IPsl. (7.85) 


since the distance between affine planes {u + z|z € L} is at least |B7|, and |x — x | 
is the distance between x and the nearest affine plane, there is 


Ix —x | < lux — xl. (7.86) 


Let w = y + v = y + nfn € L, we prove that 
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Ix —e? = |x x P |x — op. (7.87) 


Because x — x = (x, — ôn) Bas x —o-2x-v yeu, so (x— x)l(x — o). 
Therefore, by the Pythagorean theorem, (7.87) holds. By induction, we have (see 
(7.79)) 


1 
lk - of < zQBTÉ MESE +++ + IBRD). 
By (7.71), | 
Bi? x 277 |gz p. 
Thus 
x—els 4 E 242? op 2") 
= 70" -Det 
4 n 
s2"* BI. 
There is 
Ix —e| < a- 8s]. (7.88) 


that is (7.84) holds. To prove (7.83), we have two situations: 
Case 1: ifu, € U + x, 
In this case, ux — v € U —u,—ve L' is the lattice point closest to x — vin L; 
so there is 
x — «| = |x’ nes yl < Can E Ux | < Cn—1|x — Ux, 
where C, = 22. By (7.87), we have 
Ix — ef? < (1 + (CC, 0) — u| < Cale — ul. 


The proposition holds. 
Case 2: If uy ¢ U + x, then 


1 
x — ul = 516i 


By (7.88), we get 


x—o|« 23|x — ul. 
Thus, Theorem 7.7 holds. 


Comparing Theorems 7.6 and 7.7, when x — 0, the approximation coefficient 
of Theorem 7.6 is 25. for general x € IR", there is an additional factor A/2 in 
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the approximation coefficient. Using the rounding off technique, we can give an 
approximation to adjacent vectors, another main result in this section is 


Theorem 7.8 Let B = [f, Bo, ..., Bn] be a Reduced basis of L, x € R” given arbi- 
trarily, u, € L is the lattice point closest to x, and [x]g is given by Eq. (7.82), then 


w = [x]g € L, and 
92 
|x — [x]a] < (iem (5) |x — ul. (7.89) 


By Theorem 7.8, [x] € L is an approximation of the nearest lattice point ux, and 


the approximation coefficient is yı (n) = 1 + 2n (3) 2, it is a little worse than the 
approximation coefficients generated by adjacent planes, but the approximation vec- 
tor is relatively simple. In lattice cryptosystem, [x], as input information has higher 
efficiency. To prove Theorem 7.8, we need the following Lemma. 


Lemma 7.37 Let B = [fi, B2,..., Bn] is a Reduced base of W^, 0, represents the 
angle between vector By and subspace Uy, where 


Ur = Y RB. (7.90) 
izk 


Then for each k, 1 € k € n, we have 


sin, > (2) , (7.91) 


Proof | < k <n given, Y m € Uk, we prove 


9\2 
lkl < (5) |m — By], m € Ux. (7.92) 
Because 
n ous lm — B 
sinÓ, = min ————, 
mEUk | Bx 


so by (7.92), = (7.91), the Lemma holds. To prove (7.92), let (Br, B5, ..., B5] be 
the orthogonal basis corresponding to the Reduced basis Reduced (i, 62,..., Bn}, 
then m € U; can express as 


m = X aibi = Taa ER. 


izk j=l 
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Write 
Bi I 
Bo H 
m = (a, » An) . = (a, ,a)U 
s ; 


where a, = 0, U is the transition matrix of Gram-Schmidt orthogonalization (see 
(7.87)). Then for any 1 < j <n,1<k <n, there is 


n 


bj = ) jaws. Be = X unupi. 


ik i=l 
So z 
m — bk = > viB;. where y; = bj — ukj. 
j=l 
Let a, = —1, then 


vi = do ajujj = aj + »3 ajuij. (7.93) 


i=l i=j+1 
Therefore, Eq. (7.92) can be rewritten as 


k 


9\ 
IB? = J ugle? S (5) vier. (7.94) 
j=l 


j=! 


Let us first prove the following assertion: 


n , 2 2(n—k) 
)vie (5) (7.95) 
j-k 


If the above formula does not hold, i.e., 
n 2(n—k 
" 2 (n—k) 
dy < 3 : 
j=k 


Then for all j,k < j < n, there is 


5 2 2(n—k) 2 (n—k) 
y; « (5) => lyjl < (5) i (7.96) 


7.5 Approximation of SVP and CVP 305 


By (7.93), 
Yn = an 
Yn-1 = An—1 + Ay Unn-1 


Yn-2 = An—2 + dn-1Un—1n—2 + AnUnn-2 


Yk = Ak + Ak+1Uk+1k Ft dunk 


3 n—j 2 n—k 
a< (5) G) : (1.97) 


Because when j = n, an = yn, (7.96) ensures that (7.97) holds. Reverse induction 
of j(k < j < n), by (7.93), 


We can prove 


n 


n 
Jail 
lal o lyj- 2, aw S m+’ 5 


i=j+l i=j+l 


2 n—k 1 n 3 n—i 2 n—k 
<(3) *32:6) G) 


i=j+1 
2 n—k 1 2 n—kn—j— 3 i 
=(3) +36) 3 (5) 


i= 


1 
ANUS 2\ "=k 3\"-4 
-(3) +() (() | ) 
ay C e 
Gr 
Therefore, under the assumption of (7.96), we have (7.97). Take j — kin (7.97), then 
|ax| < 1, but a, = — 1, this contradiction shows that Formula (7.96) does not hold, 


thus (7.95) holds. 
We now prove Formula (7.94) to complete the proof of Lemma. By Lemma 7.33, 


reese lay ees a. 


And 
ra? tb eres 2x. 


Therefore, there is an estimate on the left of Eq. (7.94) 
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k k 
2 2 2 2 4k—j 
S ups» 22-9 
j=l j=l 


k 
1 E 
S IB X 2? 
j=l 
= “peek —1) 
4 
an ip 


On the other hand, there is an estimate on the right of (7.94), 


n n 
2 2 2 2 
S GB ES vii 
j=l j=k 


DD 


j=k 


n 
Ser S 
j=k 


2 2(n—k) 
22 (5) [Bil 


=e i IB 
= ig kl: 
Thus (7.94) holds, we complete the proof of Lemma 7.37. 
Now we give the proof of 7.8: 


Proof (The proof of Theorem 7.8) Let B = (fj, Bo, ..., Bn} be a Reduced basis of 
lattice L = L(B), 1 < k <n given, U; is a linear subspace generated by B — {£k}, 
by Lemma 7.37, we have 


042 
al < (3) Im — By], V m € Ug. (7.98) 


Let x € R”, o = [x]z € L, then 


n 1 
x-o=x- [xls =} lofi, l| s ;ü <i <n). 


i=l 


Let u, be the nearest grid point to x in L, and let 
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n 
uy — 0 = X abi, ai € Z. 
i=1 


We prove 


n 


9\2 
[ux — e| < 2n (5) [ux — x|. 


Might as well make ux 4 c, and suppose 
lax Bk| = Hot |ajBj| > 9. 


Obviously, 
ux — œ| < nlaxBxl. 


On the other hand, 
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(7.99) 


(7.100) 


uy — x = (ux — w) + (w — x) = X (a; + ci) Bi = (ak + ck) (Bk — m). 
i=l 


where ; 

=— i ;))P; € Ur. 

jzk 

By (7.99), 

| = jak + cll ce 2) Bell | 

Uy X| = tak Ck k m| Z 249 kak. 
There is 
9A? 
la Bs | <2(5) lux — x|. 

So 


n 


9\2 
u ol < 21 (3) [ux — x]. 


That is (7.99) holds, finally, 


9\2 
1-os i-u cale (rem (7) Jenn 


We complete the proof of Theorem 7.8. 
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7.6 GGH/HNF Cryptosystem 


Lattice-based cryptosystem is the main research object of postquantum cryptography. 
Since it was first proposed in 1996, it has only a history of more than 20 years. 
Among them, the representative technologies are Ajtai-Dwork cryptosystem, GGH 
cryptosystem, McEliece-Niederreiter cryptosystem and NTRU cryptosystem based 
on algebraic code theory. We will introduce them, respectively, below. 

GGH cryptosystem is a cryptosystem based on lattice theory proposed by Gol- 
dreich, Goldwasser and Halevi in 1997. It is generally considered that it is a new 
public key cryptosystem to replace RSA in the postquantum cryptosystem era. 

Let L C Z” be an integer lattice, B and R are two generating matrices of L, that 
is 

L = L(B) = L(R). 


Because there is a unique HNF base in L (see Lemma 3.4). Let B = HNF(L) be 
HNF matrix, B as public key and R as private key. Let v € Z” be an integer point, 
e € R” is an error vector. Let o be a parameter vector. Take e = o or e = —o, they 
each chose with a probability of L. 

Encryption: for the plaintext v € Z” encoded and input and the error vector ran- 
domly selected according to the parameter vector o, the public key B is used for 
encryption. The encryption function fg,o is defined as 


feou, e) = Bu+e=ceR’. (7.101) 


Decryption: decrypt cryptosystem text c with private key R, because c € R” , 
R = [a1, @2,...,Q,], then c can be expressed in (o, @2,..., Œn linearity, 


n 
e J Xii, x; € R. 
i=l 


Let ô; be the nearest integer to x;, define (see (7.81)) 
[clr = ) o; € L. (7.102) 
i-i 


Define the decryption function as 


= =i 
= B = V, 
fps CO) [clr = v (7.103) 
e —c-— Bv. 
In order to verify the correctness of decryption function fg. zi we first prove the 
following simple Lemma. For any x € R”, and R = [o,05, ..., Œn] € IR" is any 
set of bases of IR", if x = (a1,a5,...,a,) € R”, y; represents the integer closest to 
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a;, then define (see (7.7)) 
[x] = (vi, yas «5» Yn) eZ. (7.104) 


Write x = 377 , x;aj, 6; is the nearest integer to x;, then define (see (7.102)) 


[x]g = X io; € L(R). (7.105) 


i=1 


Lemma 7.38 For V x € R”, R € R"*" is a set of bases of R”, we have 


[x]r = R[R !x]. 
Proof Write 
aj ài 
a à» 1 
x=| eR sbs] | eZ.la-&ls 5 
an Sn 
If x = 35 xai , R = [01, 05, ..., On], then 
xi i 
X2 à» 
x=R| . |,and[x]r — R| . |, 4; is the nearest integer to x;. 
Xn Ôn 
Thus 
ôi 
ô2 
RC RII . | =[R'x] 
Ôn 


Lemma 7.38 holds. 


Theorem 7.9 Let L = L(R) = L(B) C Z” be an integer lattice, B is the public 
key, R is the private key, v € Z" is plaintext, e is the error vector. If and only if 
[R'e] £0, 

fig (©) # v. 


Proof By the definition, cryptosystem text c = Bv + e = fp; (v, e), and 


fg, (© = B [c] = B! RIR ! c] = TIR! c]. (7.106) 
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where T = B-!R e R”*” is a unimodular matrix. Because L(B) = L(R), > 
B=RU,U e SLn(Z). 


So 
BOR-URYR-2U-T, 


that is T is a unimodular matrix. By (7.106), 


Because T is a unimodular matrix, v € Z”, so 


[Tv + R'e] = T v4 [Re]. (7.107) 


Thus 
T[R^!c] = v + T[R le]. 


That is 
fg, (c) =v + T[R e]. 


Because T is a unimodular matrix, T[R-!e] = 0 = [Rte] = 0, so the Theorem 
holds. 


By Theorem 7.9, whether the GGH cryptographic mechanism is correct or not 
depends entirely on whether [ Re] is a 0 vector, that is 


f(O = v & [R'e] = 0. (7.108) 
Therefore, when the private key R is given, the selection of error vector e and param- 
eter vector o becomes the key to the correctness of GGH password. Notice that 
(7.106), if we decrypt with public key B, then 
[B !c] = [B ! (Bv + e)] = [v + B! e] = v + [Bte]. 


Therefore, the basic condition for the security and accuracy of GGH password is 


[Rte] = 0 


[B-le] £0. (7.109) 


Because the public key B we choose is HNF matrix, [B7 le] 4 0 is easy to satisfy. 
Let B = (bij)nxn — B^! = (cij)nxn- Where cj; = bj. Let e = (e1, €2,.--, €n), 
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each e; has the same absolute value, that is |e;| =o , o is the parameter. Thus, 
2|e;| > bnn => [B^ le] z 0. Let's focus on [R^ e] = 0. 
V x = (x1, X2, ..., X4) € R”, define the L; norm |x|; and Lo norm |x| of x as 
n 
irl — pex bil Ih Dll (7.110) 
i= 
01 
a2 
Lemma 7.39 Let R € R"*" be a reversible square matrix, R! = . |, where a; 
Qn 
is the row vector of R-!. e = (e1, €2,...,€n) € R”, J| =o, V1 <i < n, let 
p = max |q;|(|a|1) (7.111) 
l<i<n 
be the maximum of the Lı norm of n row vectors of R`}, then wheno < on we have 
[R le] =0. 
Proof Suppose o; = (cii, Ci2, .. ., Cin), the i-th component of R-le can be written 
as 
n n 
X cijej <o 3 Icij| = o læilo € op. 
i=l j=l 
Ifo < 255 then each component of R~'e is < $, there is [Re] = 0. 
a) 
0t 
Lemma 7.40 Re R^", g-!—| | |, let maxj<j<n |Qiloo = I then the prob- 
On 
ability of [R e] Æ 0 is 
E 1 
P([R e] 4 0} < 2n exp | ———— ]. (7.112) 
8a2y2 
where o is the parameter, error vector e = (e1,...,€5), |e;| = o. 
aj 
1 1 is n 
Proof Let R^ = (cij)nxn, R7 e = ep where a; — 2i Cijej. 
an 


Because |cij| < NL lej| =, then cije; is in interval [-77, m therefore, by 
Hoeffding inequality, we have 
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1 1 
P flail > 5} = Dove z e2eo(- i): 


To satisfy [R-! e] Æ 0, then only one of the above conditions {|a;| > 5} is true. Thus 


The Lemma holds. 


Corollary 7.8 For any given £ > 0, when parameter o satisfies 


=| 
o x (v/s =) => P{[R e] 40} < €. (7.113) 


In order to have a direct impression of Eq. (7.113), let's give an example. Letn = 120, 
e = 10, when the elements of matrix R^! = (Cij nxn Change in the interval [—4, 4], 
that is —4 < cj; < 4, then it can be verified that the a L norm of the row 
vector of R^! is approximately e to EPI thus y — by Corollary, when 


c < (45 /810g240 x 105)! ~ 3% ~ 2.6, we have 


P([R le] £ 0) < 107. 


35: 


Itcan be seen from the above analysis that GGH cryptosystem does not effectively 
solve the selection of private key R, public key B, especially parameter o and error 
vector. In 2001, Professor Micciancio of the University of California, San Diego 
further improved GGH cryptosystem by using HNF basis and adjacent plane method. 
In order to introduce GGH/HNF cryptosystem, we review several important results 
in the previous sections. 


Lemma 7.41 Let L = L(B) C R” bea lattice, B = [B), Bo, ..., By] is the generat- 
ing base, B* = [Bj, 85, ..., Bx] is the corresponding orthogonal basis, X, = A(L) 
is the minimum distance of L, then 
(i) 
Ay = A(L) > min |B? |. (7.114) 
l<i<n 


For L = L(B), take parameter p = p(B) as 
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1 
p = ~ min [ff]. (7.115) 


2 \<i<n 
Then for any x € R”, there is at most one grid point 
acL-i|x—a|«p. (7.116) 


(ii) Suppose L C Z” is an integer lattice, then L has a unique HNF base B, that is 
L = L(B), B = (bij)nxs, satisfies 


0 < bij < bi, when 1 <i < j <n, bj; =0,whenl<j<i<n. 


That is, B is an upper triangular matrix, and the corresponding orthogonal basis 
B* of B is a diagonal matrix, that is 


B* = diag(b;, b», ERE by). 
Proof Equation (7.114) is given by Lemma 7.18 and the property (ii) is given by 


Lemma 7.26. We only prove that if there is lattice point æ € L > |x — «| < p, then 
o is the only one. Let a; € L , a2 € L, and 


lay —x| < p, loo — x| < p > |a1—05| < 2p = min |5j] X. 


Because o, — @ € L, this contradicts the definition of A;. There is o; = o». 


In the previous section, we introduced Babai's adjacent plane method (see (7.82)). 
The distance between two subsets A; and A» in R” is defined as 


[A1 — A2| = min{|x — y||x € Ai, y € A2}. 
x € R” is a vector, A C R” is a subset, the distance between x and A is defined as 
|x — A| = min(lx — ylly € A}. 


Suppose L € R” , B = [£1, fo, ..., Bn] is a generating base, B* = [Bj, B5, ..., B5] 
is the corresponding orthogonal basis. Define subspace 


U = L(Bi, Bo, ---, Pu) = R71, L' = YZ] ZB; is a sub-lattice. 
Ay =U+0, vel. 


A, is called an affine plane with v as the representative element. Any x € R”, let A, 
be the affine plane closest to x, that is 


Ix — Ay| = min{|x — Ag|la € L}. 
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Let x’ be the orthogonal projection of x on A,. Because x’ — v € U = R"^!, Recur- 
sively let y € L’ be the nearest lattice point to x’ — v. Then we define the adjacent 
plane operator rg of x under base B as 


T(x)=w=ytvel. (7.117) 
Lemma 7.42 Under the above definition, if vı, v2 € L, and Ay, Æ Ay, then 


[Ay = Anl > DB. (7.118) 
Proof vi, v; € L, then it can be given by the linear combination of (Br, B5, ..., Bx}, 
that is 

vj = Y aa], where a; € R, a, € Z. 


v) = Y bif? where b; € R, b, € Z. 


In order to prove the n-th component, a, and b, are integers, let 


n 


n 
vi = afi v2 = » iB ai BF € Z. 
i=l 


i=l 


Therefore, 
(EDO (BED 
an = = o a Z. 
IP; | Br] 


The above equation uses Eq. (7.52), which can prove b, € Z in the same way. By 
condition v, — v2 ¢ U, then a, = bn, therefore 


[Av — Ay] = las — 5.8; = 181. 


We have completed the proof of Lemma. 


Lemma 7.43 Under the above definitions and symbols, suppose x € R", 
x =o), vi}, ô is the nearest integer to yn, then 


(i) 
v = 0f, x! = S yir 8f. (7.119) 


i=l 


That is, the affine plane closest to x is Ag,, the orthogonal projection of x on A, 
is x’. 
(ii) Let uy, € L be the lattice point closest to x, then 


|x — x'| € |x — us]. (7.120) 
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Proof Take v = 5B,, then v € L, we want to prove that the distance between x and 
A, is the smallest. Because x = » 7 , y; so (see (7.119)) 


n—1 


x—v-2 M yBE Qs — 9B; 


i-l 
* 1 * 
= |x — A| = |x = v = U| < DPI S SU 
Let vı € L, v — v € U, by trigonometric inequality, 
x — Ay = lAn — Ad — be — Au = ITE — 31241 = 31831 2 Ix As. 


So it is correct to take v = 68,. Secondly, we prove that the orthogonal projection x’ 


of x and affine plane A, is 
n-l 


x' 2 Y yip? + op. 


i=l 
Let's first prove x’ € A,. Because v = 58,, and 
n—1 n—1 


Bn = Y cib? + BX => 5B, Y 8c; BF + 86% = v. (7.121) 
i-l i=] 


Thus 


n—1 


x — v= y — óc f? € U. 


i=l 


That is x' € U + v =A,. And x — x’ = 8f; — (x — x’) LU. Because 


U( A. 2. 


Then A, and U are two parallel planes, thus (x — x").LA,. This proves that the 
orthogonal projection of x on A, is x', and thus (i) holds. 
The proof of (ii) is direct. By the definition of x and any affine plane A,, the 
distance of a € L satisfies 
|x — e| > |x — Ag]. 


When o = v, because (x — x’) Ay, thus 
Ix —x'!l = |x — A| < Ix — Ag, Vo € L. 


Let uy € L be the lattice point closest to x, then take œ = ux, there is 
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x —x'| < |x — Anl < Ix —u,|. 
The Lemma holds. 


Lemma 7.44 Let L = L(B) C R” be a lattice, x € R",a € L. If |x -—a| « p, 
where p — ; min{|*||1 xin) then the nearest plane operator vg has 


Tpx) = üt (7.122) 


Proof Because of 
|x — Ag| < |x —a| < p. 


By Lemma 7.42, A, is the plane A, closest to x, that is Ag = Ay. And Tg (x) = w = 
y + v, then we have 

|x —w| € |x — à| < p. (7.123) 
By Lemma 7.41, we have œ = w = rg(x). The Lemma holds! 


Now let's introduce the workflow of GGH/HNF password: 


1. L = L(B) = L(R) C Z” is an integer lattice, R = [rj, r2, ..., Fn] is the private 
key, B = [P1, 2, ..., By] is the public key, and is the HNF basis of L, where 


B* = diag(bii, b22, ..., Dan}. 


We choose the private key R as a particularly good base, that is o = i min{|r;||1 < 
i < n}. Specially, public key B satisfies 


1 
zbi <p,V1<i<n. 
2. Let v € Z” be an integer, e € R” is the error vector, satisfies |e| < p. 
3. Encryption: after any plaintext information v € Z" and error vector e are selected, 
with o as the parameter, the encryption function fg , is defined as 


fa,9(v, e) = Bv+e =c. 


4. Decryption: We decrypt cryptosystem text c with private key R. Decryption is 
transformed into 


fg (v, e) = B'tr(c). 
where rg is the nearest plane operator defined by R. 


By Lemma 7.44, when |e| < p, > |c — Bv| = |e| < p, thus 


B tpr(c) = B !rg(Bv + e) = B ! Bv = v. (7.124) 


7.6 GGH/HNF Cryptosystem 317 


This ensures the correctness of decryption. 


Comparing GGH with GGH/HNF, they choose the same encryption function, but 
the decryption transformation is very different. GGH adopts Babai’s rounding off 
method, while GGH/HNF adopts Babai's nearest plane method. There is a certain 
difference between the two at the selection point of error vector e. The error vector 
e of GGH depends on each component of parameter o and e, and +o. The error 
vector e of GGH/HNF depends on the parameter p as long as the length is less than 
p. Therefore, GGH/HNF has greater flexibility in the selection of error vector e. 

Next, we explain the reason why public key B chooses HNF basis. For any 
entire lattice L = L(B) C Z”, B* —[Bf, B5, ..., B;] is the corresponding orthog- 
onal basis. Using the congruence relation mod L, we define an equivalent relation 
in R", which is also the equivalent relation between integral points in Z". By Lemma 
7.24, quotient group Z"/L is a finite group, and |Z"/L| = d(L). We further give a 
set of representative elements of Z"/L. Let 


F(B*) = {>> seri < xi < | (7.125) 


i=1 


be a parallelogram, it can be compared with the base area F = F(B) of R"/L (see 
Lemma 7.16). 


F = F(B)= [Saos < | 


i-l 
F is just a quadrilateral. 


Lemma 7.45 For any integer point a € Z”, there is a unique w € F(B*) such that 
a = w(mod L). 


Proof o € Z” is a integer point, then o can be expressed as a linear combination of 


B*, write 
n 


a= ak, a; € R. 


i-l 
[aj] represents the largest integer not greater than a;, Suppose 


n 


w=} a; — » alf. (7.126) 
i=l 


i=1 


Then 


a-w= Ya; € L > o = w(mod L). 


i=l 


318 7 Lattice-Based Cryptography 


We prove that w € F(B*), linearly express w with the basis vector of B*, 


w = py bi BF. 
i=l 
We can only prove that O < b; < 1. By (7.52), it is not difficult to have 


p, — Bi) _ an = Land IBS? _ 


IB; I8; 


an [an ] * 


Thus 0 < b, < 1, It is not difficult to verify that V 1 <i < n, we have 0 < b; < 1 
by induction, that is w € F(B*). To prove uniqueness. Let 


n 


w= «B. where |a;| « 1. 


i=l 
We prove that if 

w = O(mod L) & w = 0. (7.127) 
Write w = ba bi Bi, then by (7.52), there is 


: * b, *|2 
a = BA bB y 


(B Ba) d 


Because of w € L and |b,| < 1 — b, = 0. It is not difficult to have b; = b; =--- = 
b, = 0 by induction. That is w = 0, (7.127) holds. 
a € Z”, ifw; € F(B*), w € F(B*),a = wı (mod L), a = w2(mod L), then 


WU, — w = O(mod L). 


By (7.127), there is w; = w2. As can be seen from the above, w; € F(B*), w2 € 
F(B*), then when w, Æ wy, there is wj Æ w2(mod L), that is, the points in F(B*) 
are not congruent under mod L, the Lemma holds. 


From the above lemma, any two points in parallelogram F (B*) are not congruent 
mod L, therefore, for not congruent lattice points o, o» € L, then 


{F (B*) + œi} {F(B*) + a} = Ø. 


Thus, R” can be split into 
R” = User F(B*) + a. (7.128) 


By Lemma 7.45, any a € Z”, there exists a unique w € F(B*) > a = w(mod L), 
define 


7.6 GGH/HNF Cryptosystem 319 
w = o mod L. 


Then a — o mod L gives a surjection of Z” — Z” N F(B*), this mapping is a 1-1 
correspondence of Z"/L — Z” N F(B*). Because if a, B € Z”, then 


a = B(mod L) > a mod L = B mod L € Z” N F(B*) 

a Æ B(mod L) > a mod L Æ 8 mod L. 
By Lemma 7.24, we obviously have the following Corollary. 
Corollary 7.9 If L = L(B) C Z” is an integer lattice, then F(B*)  Z" is a repre- 
sentative element set of Z" / L, and 


|F(B*)NZ"| = d(L). (7.129) 


If B is the HNF basis of the whole lattice L, then B* = diag(bii, b22, . .. , Dan}, 
thus, parallelogram F (B*) takes the simplest form: 


F(B*) = (61,32. ...,x8)]0 < x; < bii}. (7.130) 
This is a cube with a volume of d(L). Thus 
Z"/L = F(B*)nZ" = {(x1, X2,.--,;Xn)|0 € x; < bii, x; € Z}. (7.131) 


This is another proof of Lemma 7.24. 
o mod L is called the reduction vector of œ under module L, for any a € Z”, 
express that the number of bits of the reduction vector œ mod L is 


> log bj; = log | [vo = logd(L). (7.132) 


i=l 


To sum up, the parallelogram of the HNF basis of L has a particularly simple 
geometry, which is actually a cube, which is very helpful for calculating the reduction 
vector x mod L of an entire point x € Z”, the reduction vector is of great significance 
in the further improvement and analysis of GGH/HNF cryptosystem. For detailed 
work, please refer to D. Micciancio's paper (Micciancio, 2001) in 2001. 


7.7 NTRU Cryptosystem 


NTRU cryptosystem is a new public key cryptosystem proposed in 1996 by the 
number theory research unit (NTRU) composed of three digit theorists J. Hoffstein, 
J. Piper and J. Silverman of Brown University in the USA. Its main feature is that 
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the key generation is very simple, and the encryption and decryption algorithms are 
much faster than the commonly used RSA and elliptic curve cryptography, NTRU, in 
particular, can resist quantum computing attacks and is considered to be a potential 
public key cryptography that can replace RSA in the postquantum cryptography era. 

The essence of NTRU cryptographic design is the generalization of RSA on 
polynomials, so it is called the cryptosystem based on polynomial rings. However, 
NTRU can give a completely equivalent form by using the concept of q-ary lattice, so 
NTRU is also a lattice based cryptosystem. For simplicity, we start with polynomial 
rings. 

Let Z[x] be a polynomial ring with integral coefficients and N > 1 be a positive 
integer. We define the polynomial quotient ring R as 


R = Z[x]/(x" — 1) = (ag + aix +--+ ay ix" lla; € Z}. 


Any F(x) € R, F(x) can be written as an entire vector, 


N-1 
F(x) = Ex = (Fo, Fi, ..., Fy) € Z”. (7.133) 
i=0 


In R, we define a new operation Q called the convolution of two polynomials. Let 
N-1 N-1 
F(x) = rx F;x', G(x) = »» G;x'. 
i=0 i=0 


Define 
N-1 


F@G=H(x)= » Hx = (Ho, Hy... Hy). 
i=0 


ForanykK,0O xk x N—1, 


k N-1 
H; = »$ FiGk-i + > FiGuik-i 
i=0 


i=k+1 
7.134 
- Y no, (7.134) 
Oxi-N 
0<j<N 
i+ j=k(mod N) 


Lemma 7.46 Under the new multiplication, R is a commutative ring with unit ele- 
ments. 


Proof By (7.134), 


F@G=G®F,F8(G+H)=FQG+FOH. 
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So R forms a commutative ring under @. 
Ifa € Z,0 <a < N —1, is a constant polynomial in R, then 


a& F =aF = (aFo,aF\,...,aFy_). 
Therefore, R has the unit element a = 1. The Lemma holds.. 


Let F(x) = (Fo, Fi, ..., Fy—1) € R. Define 


F= F;, is arithmetic mean of the coefficients of F. (7.135) 


ll 
o 


The L? norm (European norm) and L® norm of F are defined as 


IF = Qo (F; - F) 


. (7.136) 
[Foo = maxoz;zw-1 F; — MiNg<j<y_1 Fj. 


Definition 7.11 Let dı, d» be two positive integers, and dı + d; < N, define poly- 
nomial set A(d;, d2) as 


A(d,, d2) = {F € R|F has d, coefficients of 1, d» coefficients of — 1, 
other coefficients are 0]. (7.137) 


Lemma 7.47 Letl <d < [4], 
(i) Suppose F € A(d,d — 1), then 


1 
Fh = ,/2d — 1 — —. 
IFl2 =y » 


[Fh = A24. 


Proof If F € A(d,d — 1), by (7.135), then F = 


(ii) If F € A(d, d), then 


1 
A thus 


so (1) holds. If F € A(d, d), then F — 0, thus 
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(JF = 2d, = |Fl = V2d. 
The Lemma holds. 


The parameters of NTRU cryptosystem are three positive integers, N, q, p, where 
1 < p < q, and (p, q) = 1, that is 


parameter system = {(N, q, p)ll < p «q,and (p,q) = 1}. 


When the parameter (N, q, p) is selected, we will discuss the key generation of 
NTRU. 

Key generation. Each NTRU user selects two polynomials f € R, g € R, deg f = 
deg g = N — l,asprivate key. Take f = (fo, fi,.--, fN-1), 8 = (80, Z1, ---, 2N-1) 
as the row vector, then (f, g) € Z?N c R?F, Where f mod is reversible as a poly- 
nomial on Z, and f mod p is reversible as a polynomial on Zp, that is 3 Fy € 
Z,|[x], Fp € Zy[x] such that 


F, ® f = \(modq), and F, & f = I(mod p). (7.138) 
When the private key ( f, g) is selected, the public key h is given by the following 
formula: 


h = F, & g(modq). (7.139) 


h can be regarded as a polynomial on Z,. Quotient rings Z, and Z, are 


Z, = 2/42 = |a Z| $ sa«1]. 
p 


Z, =Z/pZ= |a Z| Zia h 


Encryption transformation. User B wants to use NTRU to send encrypted information 
m to user A. First, the plaintext m is encoded as m € R, that is m € ZN, then take 
the value under mod p, that is 

me Zy ; 


Then select a polynomial $ € R, deg = N — 1 at random, then use the public key 
h of user A for encryption. The encryption function ø is 


o (m) =c = pọ & h + m(mod q). (7.140) 
c is the cryptosystem text received by user A, c is a polynomial on Z, and a vector 
in ZY. 
Decryption transformation. After receiving cryptosystem text c, user A decrypts 
it with its own private keys f and F, and first calculates 
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a = f & c(mod q). (7.141) 
a as a polynomial on Z,, that is, a € ZN is unique. Finally, the decryption transform 
go" is 
o~! (c) =a® F,(mod p). (7.142) 
Why is the decryption transformation correct? If the parameter selection meets 


pó &h--m eZ. (7.143) 


Then 
c— po &h-m. (7.144) 


Similarly, if c & f € Zy, then a = f & c. By (7.142), 
ag F, = F, 8 f 8c = c = pd 8 h + m(mod p). 


Thus 
a & F, = m(mod p). 


Because m € ZN, so 
o l(c) =a® F, = m(mod p), > o~! (c) =m. 


Therefore, the decryption transformation is correct under the conditions of (7.143) 
andc @ f e Z4. 

NTRU’s encryption and decryption transformation cannot guarantee the correct 
decryption of 100%. Because a is taken out as a polynomial under mod q for decryp- 
tion operation (see (7.142)). To satisfy (7.144), and c & f € Zi , then the following 
formula is necessary, 


If 8 clo — |f & (p$ 8h +m)lo < q. (7.145) 


Therefore, as a necessary condition, when the following formula holds, (7.145) holds. 


If @ mloo < 4 and [pb ® gl, < T (7.146) 


Lemma 7.48 Forany e€ > 0, there are constants ry and rz > 0, depending only on € 
and N, for randomly selected polynomial F, G € R, then the probability of satisfying 
the following formula is > 1 — e, that is 

P{rn|Fl2|Gl2 < |F & Glo < r2|Fl2|Gl2} > 1 — e. 


Proof See reference Hoffstein et al. (1998) in this chapter. 
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By Lemma, to satisfy (7.146), we choose three parameters dy, d, and d, where 
f € A(df, df — 1), g € A(dg, dz), ó € A(d, d). (7.147) 
By Lemma 7.47, | f |», |g|2 and |ġ|2 are known, we choose 


q 
Apr; 


Ifl- Imlo © a Iblo- gla ~ (7.148) 
r2 


Then, Eq. (7.146) can be guaranteed to be true (in the sense of probability), so that 
the success rate of the decryption algorithm will be greater than 1 — €. Thus, (7.148) 
becomes the main parameter selection index of NTRU. 

Next, we use the concept of q-element lattice to make an equivalent description 
of the above NTRU. We first discuss it from the cyclic matrix. Let T and T; be the 
following two N-order square matrices. 


0- 01 0 
0 I4 
T= ; Aq 
Lor ; 
0 10 0 0 


Then 7" = Tj" = Iy, T; = T’, and T, = T^, because T is an orthogonal matrix 
=> T, = T^! where Iy is the N-th order identity matrix, leta = (a1, a, ..., ay) € 
R“, it is easy to verify 


ai an 
a» aj 
T-| B8 |=| ® |, (@,a,a,...,an)Tı = (ay, a1,@2,...,@y-1). 
an an-1 
(7.149) 
a 
a2 
The following general assumptions a= | . | € IR" are the column vector. The 
an 
N-order cyclic matrix T*(a) generated by a is defined as 
T*(a) = [a, Ta, T’a, ..., Tal. (7.150) 


If b = (bı, bo, ..., by) € R” is a row vector, we define an N-order matrix 
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b 
bT, 
Tř (b) = ] . (7.151) 
bT! 
In order to distinguish in the mathematical formula, T* (a) and Tř (a) are some- 
times written as T*a and Třa or [T*a] and [Třa]. Obviously, the transpose of T* (a) 
is 


a’ 


(T*(a))' = n = Tř (a^). (7.152) 
aq 


Equation (7.150) is column vector blocking of cyclic matrix, in order to obtain row 
vector blocking of cyclic matrix. For any x € (x1, ..., xw) € RY, we let 


X = (Xy, docu X1) > X =x. 

Similarly, define column vectors x. So for any column vector a € R”, we have 
a'T, 
aT; 

T*(a) = [a, Ta; T?a, ..., Tla] = . . (7.153) 

aq 

On the right side of (7.153) is a cyclic matrix, which is partitioned by rows. We first 

prove that the transpose of the cyclic matrix is still a cyclic matrix. 


eal 


Q2 — 
Lemma 7.49 Va=| . | € RY, then (T*(a) = T*(T-a). 


an 
ay 
a2 
Proof Let a=] . | €RN, by (7152, (T*(a)'— Tt(a), where 
ON 
a’ = (0, ..., oy) is the transpose of g, let 
B = (o1, ON,UON-1, a2) = a’T}. 


Easy to verify 
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B 
BT 
T (8) = = T" (a). 
pr 
There is 
Ti (B) = (T'(8)) = T*(a). 
Because a” = (ay, ON 1, 77,02, 1), and B = oTi, so 


B'-Ta-—T!fzgac-T-f. 


We let a = £', then 
(T*(a)) = T*(a) = T*(T-!a). 
We have completed the proof of Lemma. 
Next, we give an equivalent characterization of cyclic matrix. 


aii 


d2| 
Lemma 7.50 Let A = (aij)NxN, a = . € R” is the first column of A, then 


UNI 
A = T*(a) is a cyclic matrix if and only if for all 1 xk N, ifl+i-j= 
k(mod N), then aij = aj. 


Proof If A = T* (a) is a cyclic matrix, by simple observation, there is 


a1 = d22 = :*: = INN = 411 


a21 = 432 =: +; = ann-1 = 421 


a(N-1)1 = 4n2 


ani = ayı 
Thus, 1 +i — j = k. The same applies toi < j. We have 
kK=N+4+14+i-j>1+i-j =k(modN). 


So the Lemma holds. 

The following lemma characterizes the main properties of cyclic matrices. 
ai bi 

Lemma 7.51 Ifa = : b= : are two column vectors, then 


an by 
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(i) T*(a) + T*(b) = T*(a + b). 
(ii) T*(a)- T*(b) = T*([T*a] - b), and T*(a)T*(b) = T*(b)T* (a). 
(iii) det(T*(a)) = TI, a +a +: + ayé"'). Where &(1 € k € N) is the 
root of all N -th units of 1. 
(iv) Ifthe cyclic matrix T* (a) is reversible, the inverse matrix is (T*(a)! = T*(b), 
Where b is the first column of T* (a). 


Proof (i) is trivial, because 
T* (a) + T*(b) = [a +b, T(a + b), ..., T" (a + b)] = T*(a + D). 


To prove (ii), using the row vector block of cyclic matrix (see (7.153)), then 


aT, a'Tib 
ai? a'T?b 
[T*(a)]b = : b= . , 
a'TN uT 
and 
aT, 
a'T? 
T*(a)- T* (b) = ; [b Tb, ..., T ^ B] = Agen: 
are 
where 
Ai; =a@Ti- TI b aT b =a Ti. 


By Lemma 7.50, then T*(a) - T* (b) = T*([T*(a)]b), so there is the first conclusion 
of (ii). We notice that 


AgycALIUN T'a =b'T “ash a. 
It is easy to prove that for any row vector x and column vector y, thereis x - y = x - y, 
and 


xTE =x TN 1 <k<N. (7.154) 


Thus, 
Ili-4 gy pina pepicl-j 
j PT] = PT, a a=b'T," a 

This proves that T*(a)T*(b) = T*(b)T* (a); that is, the multiplication of cyclic 
matrix to matrix is commutative. 

To prove (iii), suppose (T* (a)y = A, but det(T*(a)) = det((T*(a))’), so we just 
need to calculate det(A). Make polynomial f(x) = a; + ax +---+ayx%~!, and 
let 
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1 1 l - 1 
& & & e EN 
V=| H & Eo & 


N-1 N-I ¢N-1 N-1 
& RB 8 EN 


Then 
f» fe» p. fw) 
"^ ESED &f(&) -- Ewf (Ew) 
ENTI FCE) EN! f (&) --- &y f En) 

So 


det(A) det(V) = det(AV) = f(E1) f ($2) - -- f (Ew) det(V). 
Because &; is different from each other, that is det(V) 4 0, so 


det(A) = FEDS É) f Én) 
N 
=] [fé 


k=1 
N 

= | [+ && ang. 
k=1 


1 
0 
Now prove (iv). Lete = | . | € RY, then 
0 
T*(e) = [e, Te,..., T` le] = Iy. 
So take b € IR to satisfy 
T*(a) -b =e => b = (T*(a)) le. 
Obviously, b is the first column of (T*(a))-!, by (ii), 
T" (a)T* (b) = T*([T*(a)]b) = T*(e) = Iy. 


Thus, (7*(a))~! = T*(b). In other words, the inverse of a reversible cyclic matrix 
is also a cyclic matrix. 
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Corollary 7.10 Let N bea prime,a — | : | € R", satisfya A 1, and ppm di X 


an 
0, then the cyclic matrix T* (a) generated by a is a reversible square matrix. 


Proof Under given conditions, we can only prove det(T*(a)) #0. Let ek 
= exp (22:5), 1<k<N-—1, be N —1 primitive unit roots of N-th( because N 
is a prime), if det(T*(a)) = 0, because of Nt a; # 0, there must beak, 1 < k < 
N — 1, such that 

ay + &a» + £203 Teu € QN =0. 


In other words, £x is a root of polynomial $ (x) = a, + aox +--+: + ayx%—!,so@(x) 


and 1 +x +--- +x! have a common root £z, therefore, the greatest common 
divisor of two polynomials 


(a), TEx ty md. 


Since 1 +x 4-4 x^! is a circular polynomial, it is an irreducible polynomial, 
a # 1, contradiction shows det(T*(a)) 4 0, the Corollary holds. 


Next, we give an equivalent description of a lattice of NTRU by using the cyclic 
matrix. Firstly, we define the linear transformation o in the even dimensional 
Euclidean space R°”, if x and y are two column vectors, define 


c H = E34 e RON. (7.155) 


Equivalently, if x € RF, y € R” are two row vectors, define 
a(x, y) = «Ti, yT)) e R”. (7.156) 


Obviously, o defined above is a linear transformation of RN — IR?N. 


Definition 7.12 An entire lattice L C IR?" is called a convolution q-ary lattice, if 


(i) L is q-ary lattice, that is qZ?" c L c Z?N. 
Gi) L is closed under the linear transformation c, that is, x, y € IR" is the column 


vector, 
H ez 5]- [5] eL. 
y y Ty 


Recall that NTRU’s private key is two N — 1-degree polynomials f = Sm fixi, 
N-1 i : : 
g = Pio gix', and write f and g in column vector form: 
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fo 
f= e Z^, fF = (fo, fi, .... fy-1) € Z”. 
fu 
And 
$0 
g=| : | eZ, 9 = (gos ai... 8-1) E€ Z”. 
$N-1 


NTRU’s parameter system is N, q, p is two positive integers, N is prime, p < q, 
and defines a polynomial set 


Aa{p,0, —p} ={f(x) € ZN \d + 1 coefficients of f are p, 
d coefficients of f are p, others are 0]. (7.157) 


Select two polynomials f, g € Z" of degree N — 1, and parameter d f are positive 
integers, which meet the following restrictions. 


(A) N, p,q, dy are positive integers, N is a prime, 1 < p < q, (p,q) = l; 
(B) f and g are two polynomials of degree N — 1, and the constant term of f is 1, 
and 
f —l€Aa,(p,0, —p) 8 € Aa,{p, 0, — p). 


(C) T*(f) is reversible mod q. 


The above (A)-(C) are the parameter constraints of NTRU. Obviously, under 
these conditions, T*( f) and T*(g) are reversible matrices, and 


T*(f) = Iy (mod p), T*(g) = O(mod p). (7.158) 

After the polynomials f and g satisfying the above conditions are selected as the 
| f 
8 


private key, then € Z?N, let's construct a minimum convolution q-ary lattice 


containing EF | Suppose 


A = UT GP. TEGN, and A’ = Ee | ! (7.159) 


Consider A as an N x 2N-order matrix on Z4, that is A € LUN. then by (7.45), A 
defines a 2N dimensional q-ary lattice A, (A), that is 


Ag(A) = {y € Z| there is x € ZY > y = A'x(mod q)}. (7.160) 
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We prove that A, (A) is a convolution q-ary lattice containing | H | First, we prove 
the following general identity 

aj 
Lemma 7.52 Supposea — | : | € RY, then for x € RF and0 xk - N— 1, 


an 
we have 
T*(T*(a)x) 2 T*(a)(T*x), where T? = Iy. 


Proof k = 0 is trivial, obviously, we can assume k = 1, that is 


T(T*(a)x) = T*(a)(Tx). (7.161) 
By (7.153), 
a' Tix a'x 
a'T?x a' Tix 


T(T*(a)x) =T . = 
aT x Pu One 


Because of T = TE then the right side of Eq. (7.161) is 


an Tx a'x 

aT Is a'Tix 
T*(a)(Tx) = = 

au rs y x 


So (7.161) holds, the Lemma holds. 
Lemma 7.53 A,(A) is a convolution q-ary lattice, and H € A,(A). 


Proof By Lemma 7.27, A4(A) is a q-ary lattice, that is qZ?" C A4(A) C Z^, we 
only prove A, (A) is closed under linear transformation c. If y € Aq (A), then there 
is x € ZN > y = A'x(mod q), by the definition of c, 


EOD S TCODEX | ous 
o(y)= e] = Lor = AT x(mod q). 


Because of x € ZY > Tx € ZN, thus o(y) € A,(A). That is, A,(A) is a con- 
1 


0 
volution q-ary lattice, which is proved E] € A,(A). Lete=] .|€ ZN. then 


C ve 
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T*(f) - e is the first column of T*( f), that is 
T*(f)e = f. T*(g)e = g. 


Thus, 


Eos TOS E 
ae [7309 [7] «ast 


The Lemma holds. 


With the above preparation, we now introduce the equivalent form of NTRU in 
lattice theory. 


Public key generation. After selected private key : € Z?N, NTRU's public 
key is generated as follows: Because the convolution q-ary lattice A, (A) containing 


p | is an entire lattice, A4 (A) has a unique HNF basis H, where 


Iy T*(h 
i | 0 | , h = [T*Cf)] g (mod q). (7.162) 


By (7.48) of Lemma 7.28, the determinant d(A,(A)) of A, (A) is 
d(Aq(A)) = | det(Ag(A))| = 9g?" = q". 


So the diagonal elements of H are Iy and q Iy. By the assumption T*(f) e ZN*%, 
and reversible mod q, [T*(f)]7! is the inverse matrix of T*(f) mod q, h € Z”, its 
component h; is selected between —2 and $, that is —$ < h; < 4, such an h is the 
only one that exists. It is not difficult to verify that H is an HNF matrix and the 
lattice generated by H is A,(A), so H is the HNF basis of A, (A). H is published 
as a public key. 

Encryption transformation. The message sender encodes the plaintext as m € Z”, 


and randomly select a vector r € Z” to satisfy 
m € Aa,{1,0, —1}, r € Aa, {1, 0, —1]. (7.163) 


That is, m has dy + 1 1, d; —1, other components are 0. Then, the plaintext m is 
encrypted with the public key H of the message recipient: 


ees ad - [e (mod q). (7.164) 


c is called cryptosystem text, the first N components are m + [T*(h)]r, the last N 
components are 0. 
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Decryption transformation. If all components of m + [T*(h)]r are between inter- 
vals [— Z, 2), the message receiver can determine that the cryptosystem text c is 


o [m +T 
- ek 


Then decrypt it with its own private key T*(f), 


c = [T*(f)\m + [T* COIT" 0) ]r (mod q) 


. ] (7.165) 
= [T Gm + [T (g)]r (mod q). 


By the definition of A, there is 
[T*(f)]h = g(mod q) > T*((T*(f)]h) = T* (g)(mod q). 
And by Lemma 7.51, there is T*((T*(f)]h) = T*Cf) - T*(h), so 
T*(f)T*(h) = T*(g)(mod q). 


Equation (7.165) holds. 


If 


N 
[T* C)]m + [T*(g)Ir € |-5. | (7.166) 


So do mod p operation on [7* ( f)]m + [T*(g)]r, and by (7.158), thus 
(T* C£)m + [T*(g)]r) mod p = Iym+0-r=m. (7.167) 


The correctness of decryption transformation is guaranteed. 
In order to ensure that (7.167) holds, it can be seen from the above analysis that 
the following conditions are necessary. 


m 4- [T*(h)]r € [-2, 


231" 
[T*(f)Im + [T* (9r 


q 
2 

(7.168) 
€ [=Z, 21%, 


Obviously, the first condition can be derived from the second condition; that is, the 
(7.168) can be derived from the (7.166). We first prove the following Lemma. 


p ND 
Lemma 7.54 Ifthe parameter meets d; < A. then 


N 
[T*Cf)]m + [T (g)]r € [-5- 5 | 


Proof Because all components of m and r are +1 or 0, therefore, we only prove that 
the absolute value of the row vectors of [7 ( £)] and [7T *(g)] is not greater than Z. 
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Write f' = (fo, fi,---, fy-1), because of fo = 1, 


N-1 N-1 
es < Mi 1 Qd; Dp < T. 
i=0 i=0 
Similarly, 
N-1 N-1 à 
i| < i; = Qdr--1 E 
2al = Di lei = Ode + Dp «7 
i=0 i=0 
Thus 
* * q qj" 
[77 Cf)]m + [D (g)]r € | -—. — 
2 2 
The Lemma holds. 


According to the above lemma, NTRU algorithm needs to add the following 
additional conditions to ensure the correctness of decryption transformation: 


4-1 
I= 
dy < 2p . 


To sum up, when NTRU cryptosystem satisfies the additional restrictions (A)-(D) 
on the parameter system, the private key is : and the public key is HNF matrix H, 


the encryption and decryption algorithm can be based on the algorithm introduced 
above. 


7.8 McEliece/Niederreiter Cryptosystem 


McEliece/Niederreiter cryptosystem is a cryptosystem designed based on the asym- 
metry of coding and decoding of a special class of linear codes (Goppa codes) over 
a finite field. It was proposed by McEliece and Niederreiter in 1978 and 1985. It is 
included in the category of postquantum cryptography. We start with cyclic codes. 
Recall the concept of linear code in Chap. 2, let F} be a q-element finite field, also 
known as the alphabet, and the elements in F; are called letters or characters. The 
N -dimensional linear space Fy on F; is called the codeword space of length N. Any 


a vector a = (do, 41, ..., ay—1) € p a is called a codeword of length N, which is 
usually written as a = aga1---ay-1 € Fy , from the previous section, we have 
aT, = (ao, a1, ..., ay—1)Tı = (an-1, 0, 41, . . . , àv 2). (7.169) 


The reverse codeword a of a codeword a = aoa, - -- dy_ is defined as 
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d = ay1ay-2::: a140 € FY. (7.170) 


IfC c FA , and C is a k-dimensional linear subspace of F^, which is called a linear 
code, usually written as C = [N, k], k = 0, or k = N, [N, 0] and [N, N] is called 
trivial code, actually, 


[N, 0] = (0 = 00---0}, [N, N] = FN. 


The reverse order « code C of code C is defined as C = (c|c € C], obviously, if 
C = [N, k], then C = [N, k]. 


Definition 7.13 A linear code C of length N is called a cyclic code, if V c € C > 
cT, € C. 


Next, we give an algebraic expression of cyclic codes using ideal theory. For 
this purpose, note that FF, [x] is a univariate polynomial ring on F,, and (x^ — 1) is 
the principal ideal generated by polynomial x" — 1. Write R = F,[x]/(x" — 1) as 
quotient ring. If a = aga, ---an_1 € Fy, then a(x) = ao + aix --- Jay x^^! € 
R, so a — a(x) is a 1-1 correspondence of Ey — R and an isomorphism between 
additive groups. In this correspondence, we equate codeword a with polynomial 
a(x). That is a = a(x) > FY = R = F;[x]/(x™ — 1), and any code C C F7. 


C = C(x) = {e(x)|c e C) C R. 


That is, a code C is equivalent to a subset of F,[x]/ (x" 


reveals the algebraic meaning of a cyclic code. 


— 1). The following lemma 


Lemma 7.55 C C FN is a cyclic code <> C(x) is an ideal in IF, [x]/(x" — 1). 


Proof If C(x) is an ideal of F,[x]/ (x" — 1), obviously C is a linear code, for any 
code c = coc) ---cy_1, there is c(x) = co + cix t: 4 ey 1x 7! € C(x), thus 
xc(x) = ey i1 + cox + cx? +++) + eyx! e C(x). So cTi = cy 469017 7 
cy—2 € C, C is a cyclic code on F}. Conversely, if C is a cyclic code, then cT, € C, 
thus cT; € C, forall0 < k < N — 1 holds. Where T) = Iy is the N-th order identity 
matrix. Since the polynomial cT% (x) corresponding to cT is 


cT (x) = x*c(x). (7.171) 


So Y g(x) € R > g(x)c(x) € C(x). This proves that C(x) is an ideal. The Lemma 
holds. 


Using the homomorphism theorem of rings, we give the mathematical expressions 
of all ideals in R. Let z be the natural homomorphism of F; [x] ate F,[x]/ (ax — 1), 
then all ideals in R correspond to all ideals containing ker = (x^ — 1) in F,[x] 
one by one, that is 
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kerz = (x" — 1) CA CF, [x] — F4[x]/(x" — 1) = R. 


Since F, [x] is the principal ideal ring and A is an ideal of IF; [x], and (x —1) CA, 
then 
A = (g(x)), where g(x)|x" — 1. (7.172) 


Therefore, all ideals in R are finite principal ideals, which can be listed as follows 
{(g(x)) mod N — 1|g(x) divide x" — 1}. 


where (g(x)) mod x" — 1 represents the principal ideal generated by g(x) in R, that 
is 


(g(x)) mod x” — 1 = {g(x) f G0 < deg f(x) < N — deg(g(x)) — 1}. (7.173) 


This proves that F,[x]/ (x — 1) is a ring of principal ideals, and the number of 
principal ideals is the number d + 1 of positive factors of x" — 1. The so-called 
positive factor is a polynomial with the first term coefficient of 1. Therefore, the 
Corollary is as follows: 


Corollary 7.11 Let d be the number of positive factors of x" — 1, then the number 
of cyclic codes with length N is d + 1. 


A cyclic code C corresponds to an ideal C(x) — (g(x)) mod x" — 1 in R, we 
define 


Definition 7.14 Let C be a cyclic code, if C(x) = (g(x)) mod x" — 1, then g(x) is 
called the generating polynomial of C, where g(x)|x^ — 1. 


If g(x) = x" — 1, then (x — 1) mod x — 1 = 0, corresponding to zero ideal 
in R. Thus, the corresponding cyclic code C = {0 = 00 - - - O} is called zero code. If 
g(x) = 1, then (g(x)) mod x" — 1 = R. The corresponding code C = F7. There- 
fore, there are always two trivial cyclic codes in cyclic codes of length N, zero code 
and RY , which correspond to zero ideal in R and R itself, respectively. 


Lemma 7.56 Let g(x)|x" — 1, g(x) be the generating polynomial of cyclic code C, 
and deg g(x) = N — k, then C is [N, k] linear code, further, let g(x) = go + g1x + 
seen Eye T RE BA X cR the corresponding codeword g = (go, g1. ..., 
gn—x, 0,0, ...,0) € C, then the generating matrix G of C is 


E 
8Ti 
G= : (7.174) 


k-1 
8T kxN 
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Proof Let C correspond to ideal C(x) = (g(x)) mod x" — 1, then g(x), xg(x),..., 
x! g(x) e C(x), their corresponding codewords are [g, gTj,..., gT") c C, 
let’s prove that{g, gT1, ..., gT% 'Jisaset of bases of C.Ifda; € Fy > sa aigT| 
= 0, then its corresponding polynomial is 0, that is 


k-1 k-1 k-1 
(Y sti) œ = » aigTi œ) = } auisQ9 = 0. 
i=0 i=0 i=0 


Thus 
k-1 


X ax 202va200zizk-l 
i=0 


That is, {g, gT1, ..., er} is a linear independent group in C. Further Vc € C, 
we can prove that c can be expressed linearly. suppose c € C, then c(x) € C(x), by 
(7.174), there is f (x), 


fG) = fot fix b fia => cx) = go) f) 
k-1 k-1 
=) lesu > c= Y fen. 
i=0 i=0 
This proves that the dimension of linear subspace C is N — deg g(x) = k; that is, C 


is [N, k] linear code. Its generating matrix G is 


g 
gl; 
G= . 


k-1 
£T, kxN 


The Lemma holds. 
Next, we discuss the dual code of cyclic code and its check matrix. 


Lemma 7.57 Let C C Fy be a cyclic code and g(x) be the generating polyno- 
mial of g(x), deg g(x) = N — k, let g(x)h(x) = x" — 1, h(x) = ho + hix +--+ 
hyx*, h = (ho, hy, ..., he, 0,0, ,0) € FN is the corresponding codeword. h is 
the reverse order codeword, then the check matrix of C is 


hT, 
H= . : (7.175) 


LqN-k-l 
AT, (N—k)xN 
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The dual code C+ of C is [N, N — k] linear code, and 
Ct = (aH|a € FY}, 


h(x) is called the check polynomial of cyclic code C. 


Proof By Lemma 7.56, C is a k-dimensional linear subspace, and the generat- 
ing matrix G is given by (7.175). Because of g(x)h(x) = x" — 1, then there is 
g(x)h(x) = 0 in ring R. Equivalently, 


gohi + ghi i t gui cya 206, VOXSi E N— I. 


The matrix of the above formula is expressed as GH’ = 0, so H is the generation 
matrix of dual code of C, and we have Lemma holds. 


Remark 7.5 The polynomial h(x) corresponding to the reverse codeword h is 


h(x) = hoxN7! + hix? Pisand hx Y=! 


In general, when h(x)|x% — 1, A(x) 1 x" — 1, therefore, the dual code of cyclic code 
is not necessarily cyclic code. 


Definition 7.15 Let x" — 1 = gi(x)go(x) --- g(x) be the irreducible decomposi- 
tion of x" — 1 on F4, where g; (x)(1 < i < t) is the irreducible polynomial with the 
first term coefficient of 1 in F,[x]. Then the cyclic code generated by g; (x) is called 
the i-th maximal cyclic code in FY , denote as M+. The cyclic code generated by 
x^-1 
gi (x) 


is called the i-th minimal cyclic code, denote as M; . 


Minimal cyclic codes are also called irreducible cyclic codes because they no 
longer contain the nontrivial cyclic codes of Fy in M; . The irreducibility of minimal 
cyclic codes can be derived from the fact that the ideal M; (x) in R corresponding 
to M; is a field. We can give a proof of pure algebra. 


Corollary 7.12 Let M; be the i-th minimal cyclic code of FN (1 <i<t), M; (x) 
is the ideal corresponding to M; in R, then M; (x) is a field, thus, M; no longer 
contains any nontrivial cyclic code of Fy ; 


Proof Let g(x) = (x — 1)/gi(x), gi (x) be an irreducible polynomial in F,[x], by 
(7.175), 
M; (x) = g(x)F, [x]/(x" — DE, [x] = F,[x]/g;i X), [x], 


where g(x)IF, [x] is the principal ideal generated by g(x) in F,[x]. Since g;(x) is an 
irreducible polynomial, so M; (x) is a field. 


Example 7.1 All cyclic codes with length of 7 are determined on binary finite field 
Fo. 
Solve: Polynomial x’ — 1 has the following irreducible decomposition on Fy 
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x -1e9eü DG -x4DO342x^3- 1). 


Therefore, x” — 1 has 7 positive factors on IF?, by Corollary 7.11, there are 8 cyclic 
codes with length of 7 on F2. Where 0 and F} are two trivial cyclic codes. There 
are three maximal cyclic codes generated by g(x) = x — 1, g(x) = x? +x +1 and 
g(x) = x? + x? + 1, respectively. The dimensions of the corresponding cyclic codes 
are 6 dimension, 4 dimension and 4 dimension. Similarly, there are three minimal 
cyclic codes, corresponding to the dimension of one and two three-dimensional cyclic 
codes. 

Another characterization of cyclic codes is zeroing polynomials, if x" — 1 — 
gi(x)--- gi (x), the ideal M; (x) in R corresponding to the maximum cyclic code 
M; <i € t) generated by g;(x) is 


Mj (x) = {gi(x) f G)I0 < deg f(x) < N — deg g: (x) — 1}. 


Let 8 be a root of g; (x) in the split field. Then g; (x) is the minimal polynomial of 6 
in FF; [x], all c(x) € M7 (x) => c(p) = 0. Therefore, 


M (x) = {c(x)|c(x) € R, and c(8) = 0]. 


Example 7.2 Suppose N = (g" — 1)/q — 1, (m, q — 1) = 1, B is an N-th primi- 
tive unit root in Fy, then the cyclic code 


C = {c(x)|c(B) = 0, c(x) € R} 
is equivalent to Hamming code [N, N — m]. 
Proof Because (m, q — 1) = 1, and 
N —q" eq"? eb q E1— (4 D"? 249"? E... (m—1) +m. 
So (N, q — 1) = I. Therefore, £/4-P Æ 1, for 1 « i < N — 1, in other words, 8! ¢ 
F, forY 1 <i < N — I holds. In Fyn, any two elements of (1, B, B2, ..., BN~'} are 


linearly independent on F4. If each element is regarded as an m-dimensional column 
vector on F,, then the m x N-order matrix 


H= [1, B, B, e.’ B~ eus 


constitutes the check matrix of cyclic code C, and any two rows of H are linearly 
independent on F}, by the definition, C is [N, N — m] Hamming code. 


Lemma 7.58 Let C C Fy be a cyclic code, C(x) C Fy [x]/(x™ — 1) be an ideal, 
(N, q) = 1, then C(x) contains a multiplication unit element c(x) € C(x) > 


c(x)d (x) = d(x)(mod x" — 1), Y d(x) € C(x). 
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The unit element c(x) in C(x) is unique. 


Proof Because (N, q) = 1 = x" — 1 has no double root in F4, let g(x) be the gener- 
ating polynomial of C and h(x) be the checking polynomial of C, that is g (x)h(x) = 
x" — 1. Therefore, (g(x), h(x)) = 1, and there is a(x), h(x) € F,[x], > 


a(x)g(x) + b(x)h(x) = 1. 


Let c(x) 2 a(x)g(x) 2 1 — b(x)h(x) € C(x), so for V d(x) € C(x), write d(x) — 
g(x) f (x), thus 


c(x)d(x) = a(x)g(x)g(x) f (x) 
= (1 — bG)h(xy)gQo) f x) 
= (x) f(x) — bG)h(x)g(x) f (x). 


Therefore 
c(x)d(x) = d(x)(mod x" — 1). 


There is c(x)d(x) = d(x) in R = F; [x]/(x" — 1). That is, c(x) is the multiplication 
unit element of C(x). obviously, c(x) exists only. The Lemma holds. 


Definition 7.16 C C Fy is a cyclic code, and the multiplication unit element c(x) 
in C(x) is called the idempotent element of C. If C = M; is the i-th minimal cyclic 
code, the idempotent element of C is called the primitive idempotent element, denote 
as 0; (x). 


Lemma 7.59 Let C, C FN, C; c Fy are two cyclic codes, (N, q) = 1, Idempotent 
elements are cı (x) c2(x), respectively, then 


(i) Cı N Ca is also the cyclic code of FY, idempotent element is c\(x)c2(x). 
(ii) Cı + C» is also the cyclic code of FY, idempotent element is cı (x) + c2 (x) + 
cı (x)c2(x). 


Proof It is obvious that C; () C2 and C, + C» are cyclic codes in IF, because they 
correspond to ideal C, (x) and C2(x) in R, we have 


Ci (x) N C5(x) and Cy (x) + C2(x) 


is still the ideal in R. Therefore, the corresponding codes C; N C2 and C, + C» are 
still cyclic codes, and the conclusion on idempotents is not difficult to verify. The 
Lemma holds. 


In 1959, A. Hocquenghem and 1960, R. Bose and D. Chaudhuri independently 
proposed a special class of cyclic codes, which required minimal distance. At present, 
it is generally called BCH codes in academic circles. 
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Definition 7.17 A cyclic code C C FN with length N is called a 5-BCH code. If its 
generating polynomial is the least common multiple of the minimal polynomial of 
B, B?, ..., B)-! , where ô is a positive integer, P is a primitive N-th unit root. 5-BCH 
code is also called BCH code with design distance of ô. If 8 € Fyn, N = q” — 1, 
such BCH codes are called primitive. 


Lemma 7.60 Let d be the minimal distance of a 5-BCH code, then we have d > 6. 


Proof Suppose x" — 1 = (x — DgiGoOga(x) -< - g,(x), B is a primitive N-th unit 
root on F}, then £ is the root of a g;(x). Let deg g;(x) =m = B € F,». Because 
of [Fn : Fj] =m, we can think of £, 82,..., B)-! as an m-dimensional column 
vector. Let H be the following m(ó — 1) x N-order matrix. 


1 B B? a pN-! 
1 p bt rere BAD 
A=|. . . ; 


ed 28-1)... (N—1)(6-1) 
1p" B B m(8—1)xN 


In fact, H is the check matrix of 6-BCH code C, that is 
c € C e cH «0. 


We prove that any (6 — 1) column vectors of H are linear independent vectors. Let 
the first component of these (8 — 1) column vectors be f^, 8^,..., 81-1, where 
i; = 0, the corresponding determinant is Vandermonde determinant A, and 


A = poten] Toph — Bi) #0. 


r>s 


Therefore, any (5 — 1) column vectors of H are linearly independent. Thus, the 
minimum distance of C isd > 6. 


Now, we can introduce the design principle of McEliece/Niederreiter cryptosys- 
tem. Its basic mathematical idea is based on the decoding principle of error correction 
code. Recall the concept of error correction code in Chap. 2, a code C C Fy is called 
t-error correction code (t > 1 is a positive integer). If for V y € F”, there is at most 
one codeword c € C > d(c, y) < t, d(c, y) is the Hamming distance between c and 
y. We know that if the minimum distance of a code C is d, then C is a t-error correc- 
tion code, where t = [421] is the smallest integer not less than e. Lemma 7.60 
proves the existence of t-error correction codes for any positive integer f, i.e., 2t + 1- 
BCH code (6 = 2t + 1), this kind of code is called Goppa code (see the next section), 
which provides a theoretical basis for McEliece/Niederreiter cryptosystem. Next, we 
will introduce the working mechanism of this kind of cryptosystem in detail. First, 
let's look at the generation of key. 

Private key: Select a t-error correction code C C FN. C = [N, k], H is the check 


matrix of C, H is an (N — k) x N-dimensional matrix. For V x € FY, x xH' 
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Fy =K is a correspondence of Spaces Fy to Fy -* letť’s prove that this correspondence 
is a single shot on a special codeword whose weight is not greater than f. 


Lemma 7.61 Vx,y € FN, if x H' = yH', and w(x) < t, w(y) < t, then x = y. 


Proof By hypothesis, 
xH’ = yH' > (x—-yH'-0—x-yeC. 


Obviously, the Hamming distance d (0, x) = w(x) < t between x and 0, and the 
Hamming distance d(x, x — y) between x and x — y is 


d(x,x — y) = wx — (x — y) = w(y) zt. 
Because C is f-error correction code, then x — y = 0, the Lemma holds. 


We use t-error correction code C and check matrix H as the private key. 

Public key: In order to generate the public key, we randomly select a permutation 
matrix Py yy so that Iy is an N-order identity matrix, Iy = [e1, €2,..., ew], o € Sy 
is an N-ary substitution, then 


P — o (Ix) = [ec Eoss EoiN] 


This kind of matrix is also called Wyel matrix. A nonsingular diagonal matrix 
diag(A1, À2, ..., AN]; € Fy, A; Æ 0) can also be randomly selected, and suppose 


P = o(diag[A1, 25, ..., Aw}) = diag{rg,, Aos +++ Ao]. 


Let M be an (N — k) x (N — k)-order invertible matrix. The public key is the (N — 
k) x N-order matrix K generated as follows, 


K = P H'M, this is N x (N — k) ordermatrix. 


We take K as the public key and H, P and M as the private key. 
Encryption: Let m € Fy be a codeword, w(m) < t, encrypt m as plaintext as 
follows. 
c=mK e€ ES, c is cryptosystem text. 


In fact, a plaintext with length N and weight no greater than t on F; is encrypted 
into a cryptosystem text with length (N — k) on F, through public key K. 
Decrypt: After receiving cryptosystem text c, decrypt it through private keys H, P 
and M. 
cM -mKM ! - mPHMM ! =mPH'. 


Since m P € Fy and m have the same root, that is 
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w(m) = w(mP) x t. 
Using the decoding principle of error correction code: all codewords x H' = m P H’ 


satisfying x € FN actually constitute an additive coset of code C, as the leader vector 
of this additive coset, m P can be obtained accurately. That is 


, decode 
— 


mPH mP. 


Finally, we have m = (m P) - P^, and get plaintext. 


7.9 Ajtai/Dwork Cryptosystem 


By choosing an appropriate n x m-order matrix A € Z7*", two m-dimensional q- 
element lattices A; (A) and AT (A) are defined (see (7.45) and (7.46)), 


A,(A) = (y € Z"|3 x e Z" > y = A x(modq)} 


and 
A; (A) = {y € Z"|Ay = 0(mod q)}. 


Using matrix A, an anti-collision hash function can be defined: 
fa: {0,1,..., d — B" — Zi, (7.176) 
where for any y € (0, 1,..., d — 1}”, define fA(y) as 
fa) = Ay modq, (7.177) 


If parameter d, q, n, m is satisfied 


nlogq -m 


nlogq « mlogd — 
logd 


(7.178) 


Then Hash function fa will produce collision, that is there is y, y €(0,1,...,d — 
1)", y Æ y, and fa(y) = fa). By (7.177), we have it directly 


AQ — y) = 0(modq) > y — y € AZ (A), 


this shows that the collision points y and y of Hash function f4 directly lead to a 
shortest vector y — y on q-element lattice A. (A). 

In order to obtain the anti-collision Hash function, the selection of n x m-order 
matrix A is very important. First, we can select the parameter system: let d — 2, 
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q =n’, n|m, and mlog2 > nlogq, where n is a positive integer. In Ajtai/Dwork 
cryptographic algorithm, there are two choices of parameter matrix A, one is cyclic 
matrix and the other is more general ideal matrix. Their corresponding q-element 
lattice AT (A) are cyclic lattice and ideal lattice, respectively. 

Cyclic lattice 


Because n|m, A can be divided into “ 


7 n x n-order cyclic matrices, that is 
A — [AO , A®,..., AG], (7.179) 


where a) € Z} is the n-dimensional column vector and A“ is the cyclic matrix 
generated by a (see (7.149), that is 


Ae quy eque Ty Loma] 1 <i < 


sla 


Aiscalled ann x m-dimensional generalized cyclic matrix, and the g-element lattice 
in IR" defined by A, 


AL(A) = {y € Z"|Ay = O(mod q)} 


is called a cyclic lattice. The Ajtai/Dwork cryptosystem based on cyclic lattice can 
be stated as follows: 

Algorithm 1: Hash function based on cyclic lattice. 

Parameter: q, n, m, d is a positive integer, n | m, m logd > nlogq. 

Secret key: ™ column vectors a € lcs 

Hash function fa : {0,1,...,d — 1)" — Z define as 


fa(y) = Ay(mod q), 


the cyclic matrix A € Z7” is given by (7.179). 

We can extend the above concepts of cyclic matrix and cyclic lattice to more 
general cases and obtain the concepts of ideal matrix and ideal lattice. Let h(x) be 
the first integer coefficient polynomial of n degree, h(x) = x" + an-1x"7! +- -- + 
aix + ag € Z[x], define the rotation matrix Tp as 


0 --- 0 —ao 
Th = : (7.180) 
—an-1 
if h(x) = x" — 1 is a special polynomial, then T, = T. T is highlighted in Sect. 7.7 
of this chapter. Here, we discuss the more general T}. Obviously, when the constant 


term ao Æ 0, T; is a reversible n-order square matrix, and T, = det(7;,) = (—1)"ap. 


Lemma 7.62 The characteristic polynomial of rotation matrix Tj is f (X) = h(A). 
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Proof By the definition, the characteristic polynomial f(A) of T; is 


f (X) = detQl, — Th) 
A 0-0 ag 


—-13 
0 aq ar ey e x dn—2 
0 - —] anı 
A 0.0 ao 
te if cel Oe EE: aia +a 
AW Mish |teetala edo ese t we 
(OU A -E as A E e b aq ÀA d do 


=à" + aah +- H aA H ao = AOL 


Lemma 7.63 Let h(x) = x" + an—1x"7! + -- -+ aix + ao € Z[x], if ao Æ 0, then 
the rotation matrix T, is a reversible n-order square matrix, and 


ai 
z] a 
= —ay, a l,- 2 
ps de n-l ee e zr. 
—09 0 : 
An-1 


Proof By the definition of Th, 
—ay ‘a Tn-1 0 —ao aza Tn-1 
Th - -1 = zi 
—dg 0 I,-1 ~Q —d 0 


So 


For a given first polynomial h(x) = x" + a, 4x"^! +- -- + a,x + ag € Z[x] of 
degree n, let R be a residue class ring of module A (x) in Z[x], i.e., 


R = Z[x]/(h(x)), (7.181) 
where (h(x)) is the ideal generated by h(x) in Z[x]. Because of deg h(x) = n, then 


polynomial g(x) € R in R has a unique expression: g(x) = g, 4x"^! + g, 2x"? + 
+++ + gix + go € R, define mapping o : R — Z” as 
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&0 
$1 

o(gx)-2| . | eZ". (7.182) 
8n-1 


Obviously, c is an Abel group isomorphism of R —> Z”. Therefore, any polynomial 
g(x) in R can be regarded as an n-dimensional integer column vector. 


80 
$1 
Definition 7.18 For any n-dimensional column vector g = o (g(x)) = j € 
8n-1 
Z” in Z^, define 
TP = Les Os To DS s T GO sns (7.183) 


the n-order square matrix 77 (g) is called an ideal matrix generated by vector g. 


Ideal matrix is a more general generalization of cyclic matrix. The former corre- 
sponds to a first n-degree polynomial h(x), and the latter corresponds to a special 
polynomial x" — 1. We first prove that the ideal matrix 77 (g) and the rotation matrix 
T, generated by any vector g € Z” are commutative under matrix multiplication. 


Lemma 7.64 For any given first n-degree polynomial h(x) € Z[x], and 
n-dimensional column vector g € Z", we have 


T, - Ty (g) = Ty (9) Ts. 
Proof Let h(x) = x" + a, 4x"! + --- + aix + ao € Z[x], by Lemma 7.62, the 
characteristic polynomial of rotation matrix T, is h(A), then by Hamilton-Cayley 
theorem, we have 


T} + ana Tp! +--+ + aTh d ao = 0, (7.184) 


there is 


= 0 —a 
TE) Th = [g, Thos T2 gs T) Ell J 


La —a 
= [Tng, I eee aog aiThg Een Qn-1T;" |g] 
= [Tng, PIX at ao aT), TR as Tg 


= [Thg, T?e, Pipe T, g] 
= Tile, Tg... Tg] 
—T,- T, (8). 
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When the monic n-degree integer coefficient polynomial h is selected, we want 
to establish the corresponding relationship between the ideal and the integer lattice 
L C Z” inthe quotient ring R = Z[x]/(h(x)). First, we define the concept of an ideal 
lattice. In short, an ideal lattice is an integer lattice generated by the ideal matrix. 


Definition 7.19 Let g = (go, g1,---,8n-1)’ € Z” be a given column vector and 
T} (g) be the ideal matrix generated by g, and call the integer lattice L = L(Tj (g)) 
an ideal lattice. 


Our main result is the 1-1 correspondence between ideal and ideal lattice in R = 
Z[x]/ (h(x)). This also explains the reason why L(7;*(g)) is called ideal lattice. 


Theorem 7.10 The principal ideal in R = Z[x]/(h(x)) 1-1 corresponds to the ideal 
lattice in Z". Specifically, 


(i) If N = (g(x)) is any principal ideal in R, then 
a (N) = le CDL f € N} = LO (9 (00)) = LO, (9). 
(ii) Ifg = (80. 81, +++, gi) € Z^, T; (g) C Z” is any ideal lattice, then 


o^ (I5 (8) = lo^ (Ib € T; (9)) = (860) C R, 


1 


where g(x) = go + g1x +++: + 8n-1x" | = o7! (g). 


Proof We first prove (i). Let g(x) = go + g1x +--+ eax" € R be a given 
polynomial, N — (g(x)) C Risa principal ideal generated by g(x) in R, by (7.182), 


1 


0 
v (g(x) = (8. 8... 8-0. = Tr): |. e LO D. 


And because 


xg(x) = ga-Ax" + ga-ax" | +--+ gix? + gox 
= (8n—2 = £u-1ds 1x7! + (gn—3 d oan me tn Tee 
+ (80 — £u-1d1)X — 8n—140, 


SO 


—8n-140 1 
&0 — Sn-141 š 0 - 
o (xg(x)) = : = Ta -8 = T; (8) € L(T, (g). 


—2 — Bn—-1dn— d 
S8n-2 §n—-14n-1 0 
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For the same reason, for 0 < k < n — 1, we have 


j=) 


o(xg(x)) = TE -g = T8) | i | e LTO). 


[e] 


Suppose f(x) € N = (g(x)), then f(x) = b(x) - g(x), where D(x) = bg + bix + 
o ba—1x”7!, then we have 


o Cf (x)) = o (b(x)g(x)) 
n—1 


=J bg) 
k=0 


bo (7.185) 


bi 
=D] | E€ LTO). 
bni 


That proves 
a(x) = a((g(x))) C LG, (9). 


Conversely, for any lattice point a € L(7;*(g)), then 
bo 


bi 
o—T;G)b-—T;G) . [| 


since o is 1-1 corresponds, by (7.185), then 
f(x) 2o (FG) = o Gr (Gb) € N = (g(a). 


So we have 
o(N) = o ((g(x))) = LO, (g)). 


(i) holds. Again, o is 1-1 corresponds, so (ii) can be derived directly. We complete 
the proof of Theorem 7.10. 


The above discussion on ideal matrix and ideal lattice can be extended to a finite 
field Z,, because any quotient ring Z,[x]/(h(x)) on polynomial ring Z,[x] in finite 
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field is a principal ideal ring. Therefore, we can establish the 1-1 correspondence 
between all ideals in R = Z,[x]/(h(x)) and linear codes on Z4. 

Back to the Ajtai/Dwork cryptosystem, let h(x) € Z,[x] be a given polynomial, 
and select an n x m-dimensional matrix A € Z7*" as the generalized ideal matrix, 
i.e., 


A= [Ai An, ..., Az], (7.186) 
where A;(1 < i < “) is the ideal matrix generated by g^? € Zy» that is 
A; = Tř (g?) = [g®, Thg®,..., T7 1900], (7.187) 


we get the second algorithm of Ajtai/Dwork cryptosystem: 
Algorithm 2: Hash function based on ideal lattice. 
Parameter: q, n, m, d are positive integers, n|m, m logd > nlogq. 
Secret key: “ column vectors g? € Z/(1 <i < ™), polynomial h(x) = x" + 


ag 4X" +--+ + ayx +a € Z4 [x]. 
Hash function fa : {0, 1,..., d — 1)" — Z defined as 


fa(y) = Ay(mod q), 


The ideal matrix A € Z7" is given by Eq. (7.186). 

We will not introduce the anti-collision performance of hash functions constructed 
by cyclic lattices and ideal lattices here. Interested students can refer to the reference 
Micciancio and Regev (2009) in this chapter. 


Exercise 7 


1. LC R” isa lattice (full rank lattice), if L* is a dual lattice of L, then the integer 
lattice L = Z” is a self-dual lattice, that is (Z")* = Z". Let L = 2Z",find L* =? 

2. Is it correct that L is a self-dual lattice if and only if L — Z"? Why? 

3. Under the assumption of exercise 1, let à; (L) be the shortest vector length of L 
and A, (L*) be the shortest vector length of dual lattice L*. Then 


(L) 341057) <n. 
4. Let ài (L), A2(L), ..., An(L) be the length of the Successive Shortest vector of 
lattice L, prove 
A102) -An(L*) = 1. 
5*. Let L be a lattice, B = [£1, Bo,..., n] is the generating matrix of L, B* = 


[6], B3,..., B5] is the corresponding orthogonal matrix. Prove: any lattice L 
has a set of bases (fi, £5, ..., Bn}, such that 


1 
4805 min(5j |. 183l; --- 1851] 24 Q2. 
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(Hint: use KZ basis on lattice L). 
6. Under the assumption of exercise 5, let 4; (L), A2(L), ..., A,(L) be the contin- 
uous minimum of lattice L, prove: 


jzizn 


7. For a full rank lattice L C R”, define its coverage radius j4(L) as 


u(L) = max |x — L|. 
xem" 


Prove: the covering radius of any lattice L exists. 
8. Prove: u(Z^) = in. 
9. For any lattice L C R”, prove: u(L) > lA (L). 
10. For any lattice L C R”, prove the following theorem: 


A) - 4G) <n. 
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